Advanced Web Scraping Tutorial Project
This repository is a companion to the article Advanced Web Scraping: Bypassing captcha, "403 Forbidden," and more. Please refer to the article for further details.
This is a scrapy web scraper for the fictional Zipru torrent site. It is designed to bypass four distinct anti-scraping mechanisms:
- User agent filtering.
- Obfuscated javascript redirects.
- Captchas.
- Header consistency checks.
The scraper is not actually functional because Zipru is not a real site. The code, however, is otherwise complete and can easily be adapted to work on other sites.