Does anyone know of a way where the user would search on a web form and have it return results based on what was IN a PDF document? I can’t think of any way to make it show any kind of excerpt, but even if it could just provide that the file it returns contains the information the user had sought.
Your question is hard to provide a solution for as it’s a bit vague. Are there structural guidelines to the content of these PDF’s? Where are the PDF’s coming from?
This is a bit over my head but have you researched on how Google does it with their preview in search? When you search a keyword and hover over the arrow, it has a red border around, what I believe to be, a summary or excerpt. Unfortunately, the excerpt or summary does not always have the keyword in the text. So I’m thinking it must be a bit complex to do this requiring some sort of algorithm. Then again, this isn’t my area.
Either way, with PHP you can extract content and/or post an excerpt from a PDF.
Basically, the situation is these guys have a library of like 75 archived “issues” of PDF’s that are just in a massive list. Most of these are in just text with a couple of images. They were hoping they could make the whole thing searchable within a CMS, similar to how you would do a search and it would help the user get an idea that the result was indeed what they wanted.
I wasn’t sure that was possible, but it sounds like it’s somewhat possible. I’m not a PHP developer, so that’s WAY over my head :)
My recommendation initially was that they may want to just have all the old issues as an “archive” and then going forward they may want to think about just writing articles so they are fully searchable and would have the PDF version available to view and download.
That’s where I eventually ended up recommending. It’s probably roughly 900 to 1100 pages of text to copy, but I think it would probably be worth it in the end to actually have that as real content instead of PDF’s only.