Forums

The forums ran from 2008-2020 and are now closed and viewable here as an archive.

Home Forums Other Searching the text within a PDF?

  • This topic is empty.
Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #44288
    JoshWhite
    Member

    Does anyone know of a way where the user would search on a web form and have it return results based on what was IN a PDF document? I can’t think of any way to make it show any kind of excerpt, but even if it could just provide that the file it returns contains the information the user had sought.

    #132673
    chrisburton
    Participant

    Your question is hard to provide a solution for as it’s a bit vague. Are there structural guidelines to the content of these PDF’s? Where are the PDF’s coming from?

    This is a bit over my head but have you researched on how Google does it with their preview in search? When you search a keyword and hover over the arrow, it has a red border around, what I believe to be, a summary or excerpt. Unfortunately, the excerpt or summary does not always have the keyword in the text. So I’m thinking it must be a bit complex to do this requiring some sort of algorithm. Then again, this isn’t my area.

    Either way, with PHP you can extract content and/or post an excerpt from a PDF.

    #132797
    chrisburton
    Participant

    This may not be ideal but 75 PDF’s doesn’t sound like that much. Of course I don’t know the extent of the content but why don’t they just copy/paste and create a digital web archive?

    #132814
    chrisburton
    Participant

    There might be simpler way to extract that content instead of doing it by hand. I’d suggest asking on Stack Overflow.

    #132777
    TheDoc
    Member

    I *think* you can do this with Google’s search https://developers.google.com/custom-search/v1/overview

Viewing 5 posts - 1 through 5 (of 5 total)
  • The forum ‘Other’ is closed to new topics and replies.