• Stars
    star
    162
  • Rank 232,284 (Top 5 %)
  • Language
    Python
  • Created about 9 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

all kinds of scrapy demo

Scrapy_demo

this project scrapes a list of websites I used to crawl most often if this project helped you, please give it a star, thanks :)

Spider list

  • douban
  • douban_oss
  • googleplay
  • cnbeta
  • ka
  • cnblogs

Project Feature

  • google play uses the crawl spider and pymongo
  • douban use the images pipeline to download image (use the headers in case of being banned), after finish it will output the txt file of item information
  • cnbeta uses sqlalchmey to save items to mysql database (or other database if sqlalchemy supports)
  • ka uses the kafka , this is a demo spider how to use the scrapy and kafka together , this spider will not close , if you push a message to the kafka ,the spider will start to crawl the url you just give
  • cnblogs use the signal handler.
  • douban_oss use the aliyun oss sdk upload the images pipeline download image to oss store.

How to use

for each project there is a run_spider.py script, just run it and enjoy :)

python run_spider.py