• Stars
    star
    148
  • Rank 249,983 (Top 5 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 10 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Scrapping data from Real Estate site www.idealista.com

Scrapy-Idealista

Description

Scrapping data from Real Estate site www.idealista.com

Requirements

  • >= python3.6.*
  • pip

Execute to install pip packages:

pip install -r requirements.txt

How to crawl Real Estate info from a URL in idealista website

  1. Add the URL to be crawled in the variable start_urls in ./idealista/spiders/idealista_spider.py. Example:
start_urls = ["http://www.idealista.com/venta-viviendas/madrid/retiro/"]

Note: crawl only one url since all records will be exported to the same csv file

  1. At the top of the project, execute the following command to crawl the URL saved in start_urls:
scrapy crawl idealista -t csv -o <flats_file>.csv

Note: where <flats_file> is the csv file where all the records are saved

Use with Docker

  1. Download repository:
git clone [email protected]:David-Carrasco/Scrapy-Idealista.git
  1. Build image: Note: Remember modify your URL to crawl.
docker build -t <image>:<tag> .
  1. Run container:
docker run --name <name_container> --rm -it -v <your-dir>:/data <image>:<tag> <flats_file>

Your data will appear on <your-dir> directory such as idealista.csv by default or with your <flats_file> filename. Example:

docker run --name scrapy_idealista --rm -it -v /home/myuser/idealista:/data scrapy_idealista:latest