Sunday 9 February 2020

Is web scraping legal?

Web scraping and crawling aren't illegal by themselves. This is a grey area.

you're using the bandwidth of somebody else, and you're freely retrieving and using their data.

General advice for your scraping or crawling projects


  1. Use an API if one is provided, instead of scraping data.
  2. Respect the Terms of Service (ToS).
  3. Respect the rules of robots.txt.
  4. Use a reasonable crawl rate, i.e. don't bombard the site with requests. Respect the crawl-delay setting provided in robots.txt; if there's none, use a conservative crawl rate (e.g. 1 request per 10-15 seconds).
  5. Identify your web scraper or crawler with a legitimate user agent string. Create a page that explains what you're doing and why, and link back to the page in your user agent string (e.g. 'MY-BOT (+https://yoursite.com/mybot.html)')
  6. If ToS or robots.txt prevent you from crawling or scraping, ask a written permission to the owner of the site, prior to doing anything else.
  7. Don't republish your crawled or scraped data or any derivative dataset without verifying the license of the data, or without obtaining a written permission from the copyright holder.
  8. If you doubt on the legality of what you're doing, don't do it. Or seek the advice of a lawyer.
  9. Don't base your whole business on data scraping. The website(s) that you scrape may eventually block you, just like what happened in Craigslist Inc. v. 3Taps Inc..
  10. Finally, you should be suspicious of any advice that you find on the internet (including mine), so please consult a lawyer.


the relevant question isn't "Is this legal?". Instead, you should ask yourself "Am I doing something that might upset someone? And am I willing to take the (financial) risk of their response?".

reference:
https://benbernardblog.com/web-scraping-and-crawling-are-perfectly-legal-right/

No comments:

Post a Comment