Treehouse: Grow your CSS skills. Land your dream job.

Last updated on:

Find All Links on a Page

Here's the basic principal behind spiders.

$html = file_get_contents('http://www.example.com');

$dom = new DOMDocument();
@$dom->loadHTML($html);

// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");

for ($i = 0; $i < $hrefs->length; $i++) {
       $href = $hrefs->item($i);
       $url = $href->getAttribute('href');
       echo $url.'<br />';
}

Comments

  1. Oleg
    Permalink to comment#

    Exactly what I needed. Thanks.

  2. Jens Törnell
    Permalink to comment#

    Perfect for affiliate sites!

  3. RONIT

    WAAAAO ITS GREAT….I WAS SEARCHING FOR THIS ONLY

  4. daniel
    Permalink to comment#

    I didnt understand quite how to use this? where to I type that? I’m kinda confused.. I need more explanation

    Thanks

  5. lande
    Permalink to comment#

    This post is just too good. thumbs up!!
    keep up the good work ;)

  6. Dario
    Permalink to comment#

    Muchas gracias por la ayuda!

  7. Zbigniew
    Permalink to comment#

    Works perfect! thx!

  8. juan
    Permalink to comment#

    Can someone please show me step by step in how to use this. Thank you in advance

  9. kazi tanvir ahsan
    Permalink to comment#

    perfect.Was using php simple DOM but not good enough like this.!

  10. shail.dw
    Permalink to comment#

    The unique power of PHP and DOM unleashed. cURL and REGEX based techinques can never match this. Though they have their own uses, ofcourse. Many thanx.

  11. Milan
    Permalink to comment#

    how to follow all other children pages ?

  12. Zen
    Permalink to comment#

    Thanks.

    Whats about performance on xPath?

  13. obliviga
    Permalink to comment#

    This is amazing. Thank you so much.

  14. Lorenzo
    Permalink to comment#

    Thanks, very simple. Great!

Leave a Comment

Posting Code

We highly encourage you to post problematic HTML/CSS/JavaScript over on CodePen and include the link in your post. It's much easier to see, understand, and help with when you do that.

Markdown is supported, so you can write inline code like `<div>this</div>` or multiline blocks of code in in triple backtick fences like this:

```
<script>
  function example() {
    element.innerHTML = "<div>code</div>";
  }
</script>
```