Forums

The forums ran from 2008-2020 and are now closed and viewable here as an archive.

Home Forums Back End How to extract some content from a site?

  • This topic is empty.
Viewing 15 posts - 1 through 15 (of 17 total)
  • Author
    Posts
  • #190063
    AlirezaM
    Participant

    Hi dear friends!

    I have a wp site (for myself) and a target site (that I don’t have access to it’s admin),

    The target site has a div with a specific class name that contains several hyperlinks,

    I want to extract those hyperlinks and add them to my wp site,

    Notice, they add new links every 5 mins and I want to extract and add them to my site automatically,

    Is it possible?

    Are there any plugins?

    Please help me on this,

    Thanks in advance!

    Alireza

    #190065
    AlirezaM
    Participant

    @chrisburton

    How should I know about sort of links?

    I just know they’re some a tags.

    #190067
    AlirezaM
    Participant

    Those links are some external links to other sites (as news titles), not pointing to download,

    And the target site is not that professional to use API, it’s using old html tags like tables and etc, this is the site and screenshot is pointing to that div:

    http://trakhtorlink.com/

    http://prntscr.com/5dswv8

    BTW the site is in RTL language.

    #190072
    AlirezaM
    Participant

    We just want to et these links and create a mobile app from wordpress RSS.

    #190076
    __
    Participant

    We just want to et these links and create a mobile app from wordpress RSS.

    That’s nice.

    Step #1 is to contact the owner/admin of this other site, and get permission to use their content. You might find there is a simpler way to achieve your goals. At the very least, you will avoid possible legal complications.

    #190083
    AlirezaM
    Participant

    Who said I don’t the permission?

    I said the site isn’t that professional to provide api.

    #190087
    __
    Participant

    I am not making any accusations against you. In your original post, you said “…I don’t have access”. When chrisburton mentioned the ethical considerations, you did not address those concerns. That sounds like you haven’t talked to the other site. Again, I am not assuming anything or accusing you of doing anything wrong. It is simply best if this issue is clearly understood and addressed early.

    The best way to parse HTML is with a tool designed for it (such as DOMDocument). From there, you can use xpath to find the specific content you’re looking for.

    #190088
    AlirezaM
    Participant

    Right, I didn’t know that I should describe we’ve agree to to create an application for these links within their owner of site, then reason we want to do this, we couldn’t add rss to this old script and I offered the WordPress, now we should do this because WordPress itself doesn’t have a plugin to do the same things that this script does
    Each one has something wrong

    #190090
    __
    Participant

    Right, I didn’t know that I should describe we’ve agree to to create an application for these links within their owner of site

    Well, there’s no need to at the beginning. But if someone asks a question about it, it would make sense to say “yes, I’ve already done that.”

    If you’re looking for a WP plugin, I’m afraid I will be even less helpful. Honestly, I doubt a good WP plugin exists for this.

    #190156
    AlirezaM
    Participant

    If you’re looking for a WP plugin, I’m afraid I will be even less helpful. Honestly, I doubt a good WP plugin exists for this.

    It’s not a must to make a plugin, a simple script is enough, I just need to put those links on other wp site.

    Do you have a ready script for this as you said you have done this before?

    #190169
    Senff
    Participant

    There is a BIG difference between taking some content from a web site (page) to use it to do something with it on another site, and using/parsing an RSS feed of a site.

    You’d need permission for the first one. And, to be honest, I don’t think it’s really possible. Or at least not easy to do that.-+

    If it’s an RSS feed, I don’t think you need direct permission, because the main purpose of an RSS feed is to serve content (in an easy-to-use format) to do something with. It’s basically saying “here’s my content, go ahead and do something with it!” (obviously within certain boundaries).

    Anyhoo, the question is: do you want to scrape the content from the site? Or do you have an RSS feed in a format (XML?) that’s easy to parse?

    #190170
    AlirezaM
    Participant

    The problem here is, we’ve used asp scrip for the site and don’t have the knowledge to add rss feed to it,

    If we could provide rss the second site was not needed then, as I described the second site’s purpose is to provide rss to use them in our app.

    We need to extract and add these links to our wp site with a script I think.

    #190171
    Senff
    Participant

    Let’s call the site with the links Site A, and the site where you want to add these links to Site B.

    Are they both on the same domain? If so, you can use some AJAX for that and pull in some data from Site A and place it in Site B.

    If not….do you have any control over Site A? If it’s made in ASP, and you do have control, it’s possible to create another script (in ASP) that would serve those links exactly the way you want (in RSS/XML format, for example).

    If you don’t have control over Site A, but only Site B, then I would say it’ll be very difficult (near impossible) to scrape content from Site A and do something with that data. You’ll really need to have some level of “cooperation” from Site A.

    #190172
    AlirezaM
    Participant

    Yes, we have access to asp site (site A), we just don’t have good knowledge about asp, you mean we can provide those links with RSS in site A?

    #190173
    Senff
    Participant

    With ASP, you can take data (I don’t know where it comes from) and then add some tags around it. For example, you can compile it into something like this:

    <html>
        <body>
    
        ...CONTENT GOES HERE...
    
        </body>
    </html>
    

    Or whatever structure you want. So, if you have those links, you can create an ASP file that outputs the following:

    <html>
        <body>
        <h1>HERE ARE MY LINKS!</h1>
    
        [this is where you put the links]
    
        </body>
    </html>
    

    Since you have full control over how it’s being output (outputted? ha!), you can serve it with XML tags, something like this:

    <?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    >
    
    <channel>
    <title>Here are my links</title>
    <atom:link href="http://www.siteA.com/" rel="self" type="application/rss+xml" />
    <link>http://www.siteA.com/link-1</link>
    <link>http://www.siteA.com/link-2</link>
    <link>http://www.siteA.com/link-3</link>
    </channel>
    

    And so on. You have the content, you have control over how you want to format it, so you can format it as an RSS feed. How you get the data, that’s all up to you. And then you can add tags and dates and whatnot using the Response.Write() function (similar to PHP’s echo).

    You’d have to read up on how to structure valid RSS feed formatting though.

Viewing 15 posts - 1 through 15 (of 17 total)
  • The forum ‘Back End’ is closed to new topics and replies.