Find All Links on a Page

Here's the basic principal behind spiders.

$html = file_get_contents('http://www.example.com');

$dom = new DOMDocument();
@$dom->loadHTML($html);

// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");

for ($i = 0; $i < $hrefs->length; $i++) {
       $href = $hrefs->item($i);
       $url = $href->getAttribute('href');
       echo $url.'<br />';
}

Comments

  1. User Avatar
    Oleg
    Permalink to comment#

    Exactly what I needed. Thanks.

  2. User Avatar
    Jens Törnell
    Permalink to comment#

    Perfect for affiliate sites!

  3. User Avatar
    RONIT

    WAAAAO ITS GREAT….I WAS SEARCHING FOR THIS ONLY

  4. User Avatar
    daniel
    Permalink to comment#

    I didnt understand quite how to use this? where to I type that? I’m kinda confused.. I need more explanation

    Thanks

  5. User Avatar
    lande
    Permalink to comment#

    This post is just too good. thumbs up!!
    keep up the good work ;)

  6. User Avatar
    Dario
    Permalink to comment#

    Muchas gracias por la ayuda!

  7. User Avatar
    Zbigniew
    Permalink to comment#

    Works perfect! thx!

  8. User Avatar
    juan
    Permalink to comment#

    Can someone please show me step by step in how to use this. Thank you in advance

  9. User Avatar
    kazi tanvir ahsan
    Permalink to comment#

    perfect.Was using php simple DOM but not good enough like this.!

  10. User Avatar
    shail.dw
    Permalink to comment#

    The unique power of PHP and DOM unleashed. cURL and REGEX based techinques can never match this. Though they have their own uses, ofcourse. Many thanx.

  11. User Avatar
    Milan
    Permalink to comment#

    how to follow all other children pages ?

  12. User Avatar
    Zen
    Permalink to comment#

    Thanks.

    Whats about performance on xPath?

  13. User Avatar
    obliviga
    Permalink to comment#

    This is amazing. Thank you so much.

  14. User Avatar
    Lorenzo
    Permalink to comment#

    Thanks, very simple. Great!

  15. User Avatar
    Sif Eddine
    Permalink to comment#

    Hi, tnx it’s very helpful yet I have a question,
    what if I have to get a link with a specific class
    wil this do it? : (html/body//a.class)

  16. User Avatar
    alexander
    Permalink to comment#

    Hi
    Is there a curl version of this?
    I’ll be appreciate that if anyone write it with curl.
    tnx

  17. User Avatar
    JoshuaFrancis
    Permalink to comment#

    Thanks Man. This is exactly what I need.

Submit a Comment

Posting Code

You may write comments in Markdown. This makes code easy to post, as you can write inline code like `<div>this</div>` or multiline blocks of code in triple backtick fences (```) with double new lines before and after.

Code of Conduct

Absolutely anyone is welcome to submit a comment here. But not all comments will be posted. Think of it like writing a letter to the editor. All submitted comments will be read, but not all published. Published comments will be on-topic, helpful, and further the discussion or debate.

Want to tell us something privately?

Feel free to use our contact form. That's a great place to let us know about typos or anything off-topic.

icon-closeicon-emailicon-linkicon-logo-staricon-menuicon-nav-guideicon-searchicon-staricon-tag