Home › Forums › Back End › How to extract some content from a site? › Reply To: How to extract some content from a site?
I think everyone just went overboard here and a little sideways with ASP… You can absolutely scrape a site you don’t own in most parts of the world legally, and while it is nice to have permission, really the owner should be the one saying what you can and cannot do to their site via license, terms or communication. It was mentioned that we don’t know, so we should not assume!
If it is put on the internet, public-access is something the site owner has to expect, and because most bot’s only read the HTML, they have a lower bandwidth-footprint than a user loading the page.
I have come across numerous previous incumbents looking to rip-off clients I start work with, making services they cannot scrape but have entered into, will not work, cannot be modified, or services that are hard to read… The crux of the issue is sometimes it’s okay to go against someone’s wishes, and we certainly should not be going super-lawyer on this guy.
There are numerous ways to read the content, some using other languages, some using PHP, some using linux command line. For PHP, it is probably best to use something like PHPQuery with cURL, or setup a proxy on your server on a different port or url structure. I have even undertaken projects where we had to login to a service, let me assure you if the browser can do this, nd you are beig paid to or want to get something badly enough, then it is next to impossible to stop someone that wants to (cURL, phpQuery,proxying+jQuery, etc).
I am unfortunately not going to go beyond giving you this information, because I believe in self-learning and personal development, especially around anything where I do not know your full intent or use-case. If you really want it, you should be able to work it out in 1-2 weeks given the information provided.