• Stars
    star
    2
  • Language
    Python
  • Created over 3 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Streaming event pipeline around Apache Kafka and its ecosystem. Using public data from the Chicago Transit Authority we will construct an event pipeline around Kafka that allows us to simulate and display the status of train lines in real time.

More Repositories

1

uber-expenses-tracking

The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Jupyter Notebook
79
star
2

apache-spark-docker

Dockerizing an Apache Spark Standalone Cluster
VBA
37
star
3

csv-schema-inference

A tool to automatically infer columns data types in .csv files
Jupyter Notebook
27
star
4

data-engineer-challenge

Challenge Data Engineer
Python
22
star
5

pyspark-on-aws-emr

The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Python
21
star
6

pyDag

Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
Python
20
star
7

Dropout-Students-Prediction

The goal of this project is to identify students at risk of dropping out the school
HTML
15
star
8

data-engineering-challenge-th

Dockerizing a Python Script for Web Scraping and consume the scraped data using FastApi (www.metroscubicos.com)
Python
12
star
9

D3JS-Dashboard

Building Responsive DashBoard with D3.js and ASP.NET MVC from scratch (SQL SERVER - SSIS - API REST)
C#
12
star
10

wbz

A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler transform (BWT) and Move to front (MTF) to improve the Huffman compression. For now, this tool only will be focused on compressing .csv files, and other files on tabular format.
Python
10
star
11

recommendation-system

Build a Content-Based Movie Recommender System (TF-IDF, BM25, BERT)
Python
8
star
12

docker-livy

Dockerizing and Consuming an Apache Livy environment
HTML
7
star
13

text-analysis-speeches-amlo

Text analysis of the speeches, conferences and interviews of the current president of Mexico
Jupyter Notebook
6
star
14

tf-idf

Term Frequency-Inverse Document Frequency from Scratch
Python
5
star
15

Huffman-decoding

A New Approach for Efficient Sequential Decoding of Static Huffman Codes
HTML
5
star
16

dataengineering-assignment

Prescreening Tasks for Data Engineer
Jupyter Notebook
5
star
17

distance-metrics

Distance metrics are one of the most important parts of some machine learning algorithms, supervised and unsupervised learning, it will help us to calculate and measure similarities between numerical values expressed as data points
Jupyter Notebook
4
star
18

csv-estimate-rows

Python
3
star
19

csv-shuffler

A tool to automatically Shuffle lines in .csv files
Python
3
star
20

livyc

Apache Spark as a Service with Apache Livy Client
Python
3
star
21

MachineLearning

The repository contains basic experiments using machine learning algorithms with python
HTML
3
star
22

RESTful-APIs-Nodejs

Building fast, scalable and secure RESTful services with Node, Express and MongoDB
HTML
3
star
23

Moving-Average-Spark

How to Compute Moving Average with Spark
3
star
24

SparkSQL-with-Python

This repository has some examples of using Spark and SparkSQL with Python through PySpark
HTML
2
star
25

Wittline

Take a look at my repository
2
star
26

GPU-Programming-with-Python

GPU programming with Python, you can take advantage of the incredible computing power of your graphics processing unit GPU. we will work with NVIDIA’s CUDA library.
2
star
27

csv-columnar

Python
2
star
28

apache-spark-course

Apache Spark with python
Jupyter Notebook
2
star
29

Data-Analytics-with-R

Repository for data analytics course using R
HTML
2
star
30

Contextual-Data-Transforms

This repository contain the most important contextual data transformation algorithms which help to improve the rate compression reached by statistical encoders. Ramses Alexander Coraspe Valdez
HTML
2
star
31

Computer-Vision-and-Deep-Learning

This repository contains information on the basic techniques and algorithms used in computer image processing, in addition to some projects related to pattern recognition using deep learning.
Python
2
star
32

csv-generator

Python
1
star
33

wittline.github.io

My github profile
SCSS
1
star
34

Python

Software Analysis, Design and Construction with Python
HTML
1
star
35

model-catalog-grpc

A gRPC service to consume any machine learning model stored in a model catalog through a single endpoint.
1
star
36

csv-splitter

csv-splitter
Python
1
star
37

Python-recursion

This repository shows the implementation of the most common recursive algorithms
HTML
1
star
38

Multiprocessing

Improving the Performance in the Statistical Redistribution of Message Symbols using Architectural patterns for Parallel Programming
HTML
1
star
39

code_challenges

Scripts for different purposes
Python
1
star
40

burrows-wheeler-transform

Implementation of the algorithm "Burrows Wheeler Transform" in python for data compression
Python
1
star