Practical Data Engineering Project
This is a practical example of a data engineering project with real-estates. The connected blog post about Building a Data Engineering Project in 20 Minutes you can find on my website. Topics are:
- Getting the Data β Scraping with BeautifulSoup
- Storing on S3-MinIO
- Custom Change Data Capture (CDC)
- Adding Database features to S3 β Delta Lake & Spark
- Machine Learning part β Jupyter Notebook
- Ingesting Data Warehouse for low latency β Apache Druid
- The UI with Dashboards and more β Apache Superset
- Orchestrating everything together β Dagster
- DevOps engine β Kubernetes
The Status of the project you find here.
Starting Dagster
To get MinIO, Spark, Kubernetes, etc. ready, check the representive folder in here.
- MinIO started
- Kubernetes ready
- Spark image and role and namespaces ready
- cd
src/pipelines/real-estate
and start dagit withdagit