• Stars
    star
    132
  • Rank 274,205 (Top 6 %)
  • Language
    Python
  • Created over 2 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data Engineering YouTube Analysis Project by Darshil Parmar

Data Engineering YouTube Analysis Project by Darshil Parmar

Overview

This project aims to securely manage, streamline, and perform analysis on the structured and semi-structured YouTube videos data based on the video categories and the trending metrics.

Project Goals

  1. Data Ingestion β€” Build a mechanism to ingest data from different sources
  2. ETL System β€” We are getting data in raw format, transforming this data into the proper format
  3. Data lake β€” We will be getting data from multiple sources so we need centralized repo to store them
  4. Scalability β€” As the size of our data increases, we need to make sure our system scales with it
  5. Cloud β€” We can’t process vast amounts of data on our local computer so we need to use the cloud, in this case, we will use AWS
  6. Reporting β€” Build a dashboard to get answers to the question we asked earlier

Services we will be using

  1. Amazon S3: Amazon S3 is an object storage service that provides manufacturing scalability, data availability, security, and performance.
  2. AWS IAM: This is nothing but identity and access management which enables us to manage access to AWS services and resources securely.
  3. QuickSight: Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud.
  4. AWS Glue: A serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
  5. AWS Lambda: Lambda is a computing service that allows programmers to run code without creating or managing servers.
  6. AWS Athena: Athena is an interactive query service for S3 in which there is no need to load data it stays in S3.

Dataset Used

This Kaggle dataset contains statistics (CSV files) on daily popular YouTube videos over the course of many months. There are up to 200 trending videos published every day for many locations. The data for each region is in its own file. The video title, channel title, publication time, tags, views, likes and dislikes, description, and comment count are among the items included in the data. A category_id field, which differs by area, is also included in the JSON file linked to the region.

https://www.kaggle.com/datasets/datasnaek/youtube-new

Architecture Diagram

Complete Tutorial

I have created a detailed 3+ hour tutorial on this project, where you will execute everything from start to end

https://youtu.be/yZKJFKu49Dk

More Repositories

1

uber-etl-pipeline-data-engineering-project

Jupyter Notebook
208
star
2

python-for-data-engineering

This repo contains all the code used in the Python for Data Engineering Course
Jupyter Notebook
201
star
3

stock-market-kafka-data-engineering-project

Jupyter Notebook
163
star
4

tokyo-olympic-azure-data-engineering-project

tokyo-olympic-azure-data-engineering-project
Jupyter Notebook
130
star
5

twitter-airflow-data-engineering-project

YouTube tutorial project
Python
92
star
6

apache-spark-with-data-bricks-for-data-engineering

apache-spark-with-databricks-for-data-engineering
Jupyter Notebook
45
star
7

amazon-web-scraping-python-project

Jupyter Notebook
44
star
8

Data-Engineer-Tutorial-Series

Jupyter Notebook
21
star
9

sql-for-data-engineering-course

sql-for-data-engineering-course
Jupyter Notebook
16
star
10

Exam_Notes_Detection

At the time of exams most of the time student share their notes via social media and after the exam gets over it become really difficut to delete all those images manually. So face this problem I have created this system which detects exam notes(pictures which are clicked from mobile camera) and deletes it.
Python
13
star
11

uber-data-engineering-mage-project

Uber Data Engineering Pipeline using Mage AI and BigQuery
Jupyter Notebook
13
star
12

ipl-data-analysis-apache-spark-project

Jupyter Notebook
12
star
13

data-warehouse-snowflake-for-data-engineering

data-warehouse-snowflake-for-data-engineering
PLpgSQL
11
star
14

Amazon_Website_Scraping_Scrapy

Using Scrapy python library to scrap amazon website and store: Title,Ratings and Reviews
Python
7
star
15

darshilparmar

About Me
6
star
16

python-tutorial

Jupyter Notebook
4
star
17

Face_Recognition_System_FaceNet

A facial recognition system is a technology capable of identifying or verifying a person from a digital image or a video frame from a video source. There are multiples methods in which facial recognition systems work, but in general, they work by comparing selected facial features from given image with faces within a database.
Python
4
star
18

Keywords_and_keyphrases_extraction

Keyword extraction is tasked with the automatic identification of terms that best describe the subject of a document.
Jupyter Notebook
4
star
19

kafka-in-10min-video-code

Jupyter Notebook
4
star
20

data-engineering-sql-tutorial

3
star
21

Step-by-step-DataScience

Data Science Workshop
Jupyter Notebook
3
star
22

Loan-Predicition

AnalyticsVidhya Hackathon: Loan Prediction System
Jupyter Notebook
3
star
23

Clustering-Indian-Postal-code-based-on-most-visited-Venues

Clustering Indian Postal Code Based on Most visited Venues
Jupyter Notebook
3
star
24

Face_Landmarks_Detection

A facial landmarks detection system is a technology capable of detecting a person from a digital image or a video frame from a video source. There are multiples methods in which facial detection systems work, but in general, they work by comparing selected facial features from given image with faces within a database.
Python
3
star
25

workflow-orchestration-apache-airflow-for-data-engineering

workflow-orchestration-apache-airflow-for-data-engineering
Python
3
star
26

scraping_tutorialspoint

Scraping all data from tutorialspoint
Python
2
star
27

food_classification-FoodPal

PHP
1
star
28

School-Management

School-Management diploma mini project
JavaScript
1
star
29

Blood-Donation

Blood Donation diploma mini project
JavaScript
1
star