estela
estela is an elastic web scraping cluster running on Kubernetes. It provides mechanisms to deploy, run and scale web scraping spiders via a REST API and a web interface.
Technologies
Project Structure
The project consists of three main modules:
- REST API : built with the Django REST framework toolkit, it exposes several endpoints to manage projects, spiders, and jobs. It uses Celery for task processing and takes care of deploying your Scrapy projects, among other things.
- Queueing : estela needs a high-throughput, low-latency platform that controls real-time data feeds in a producer-consumer architecture. In this module, you will find a consumer used to collect and transport the information from the spider jobs into a database.
- Web : A web interface implemented with React and Typescript that lets you manage projects and spiders.
Each of these modules works independently of the rest and can be changed. Each module has a more detailed description in its corresponding directory.
estela-cli
estela-cli is a command-line interface for estela.
How to Contribute
Please read CONTRIBUTING.md
and follow the steps. Remember to abide by our adapted from ESTELA Code of Conduct too.