Scrapy-Idealista

Description

Scrapping data from Real Estate site www.idealista.com

Execute to install pip packages:

pip install -r requirements.txt

Add the URL to be crawled in the variable start_urls in ./idealista/spiders/idealista_spider.py. Example:

start_urls = ["http://www.idealista.com/venta-viviendas/madrid/retiro/"]

Note: crawl only one url since all records will be exported to the same csv file

At the top of the project, execute the following command to crawl the URL saved in start_urls:

scrapy crawl idealista -t csv -o <flats_file>.csv

Note: where <flats_file> is the csv file where all the records are saved

git clone [email protected]:David-Carrasco/Scrapy-Idealista.git

docker build -t <image>:<tag> .

docker run --name <name_container> --rm -it -v <your-dir>:/data <image>:<tag> <flats_file>

Your data will appear on <your-dir> directory such as idealista.csv by default or with your <flats_file> filename. Example:

docker run --name scrapy_idealista --rm -it -v /home/myuser/idealista:/data scrapy_idealista:latest