There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Sparkify-Data-Pipelines-with-Airflow-S3-and-Redshift
This project has to output a Dataware house solution and create high-grade data pipelines that are dynamic and built from reusable tasks, monitored, and allow easy backfills. They have also noted that the data quality plays a big part when analyses are executed on top of the data warehouse and want to run tests against their datasets after the ETL steps have been executed to catch any discrepancies in the datasets.Sparkify-Data-Lake-with-Apache-Spark
This project has as output a Data Lake solution. It building an ETL pipeline that extracts their data from S3, processes them using Spark, and loads the data back into S3 as a set of dimensional tables. This will allow their analytics team to continue finding insights in what songs their users are listening to.Love Open Source and this site? Check out how you can help us