Forums

The forums ran from 2008-2020 and are now closed and viewable here as an archive.

Home Forums Back End Using SimpleHtmlDom to extract Html, loop through local directory, save files

  • This topic is empty.
Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #42209
    Crssp
    Participant

    Hi All, I’m new and a hello to everyone in the forums everyone! :)

    Anyone have any experience with the Simple Html Dom library on Sourceforge.net
    http://simplehtmldom.sourceforge.net/
    I’ve downloaded the package, and looked at many of the examples.

    What I want to do is clean up, hundreds or even thousands of files in a directory and sub-directories.
    The files have the .asp extension.
    I would like to just extract certain amounts of text, and line breaks, and strip out all the other .asp coded bits/junk.

    I can provide more specifics, just trying to get my head around saving the files and how the paths with that works? The examples are not making sense entirely.

    The code structure is pretty simple, everything is contained in two span tags, everything below that just needs stripped, or left behind.

    < span class="headline">Dude you rock!

    < div class="adsBox">< /div>

    < span class="bodytype">HOMETOWN — Local man rocks the DOM.

    New line or paragraph goes here

    Yet Another.

    End of story, sometimes has an author and maybe an email address

    < /span>

    Everything else goes away, so I’m after two innertext calls for the spans, me thinks…

    Where’s the forum search, also, I must be missing the search feature for just the forums?

    #122156
    Crssp
    Participant

    Oops trying to past a code block me get a FAIL.

    OK tried pasting code above, me sucks at this code pasting in the forums. Oops forgot I could use markdown.

    [FIXED BY MOD]
    Thanks bro, appreciated ;)

    #122290
    Crssp
    Participant

    Anyone ever used SimpleHTMLDom then. Any suggestions would be great.
    There are a few tutorials on line, but none quite apply to what I want to do.
    Another thought would be just using a good text editor in my local web folders and cleaning up the code that way.
    The though is to clean up the pages and input all 176,000 articles to a database, so a good database importer will be the next item on the list, to get the stories into wordpress for consumption?
    Does the DigWP book cover anything like that?

    #122367
    __
    Participant

    A DOM tool is not what you need: if I understand you correctly, then the files in question are not proper HTML (or XML): they include asp code and possibly random bits of “other stuff” as well. That would prevent any DOM parser from parsing it.

Viewing 4 posts - 1 through 4 (of 4 total)
  • The forum ‘Back End’ is closed to new topics and replies.