• Stars
    star
    26
  • Rank 925,880 (Top 19 %)
  • Language
  • Created 11 months ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Some example projects for Data Engineers to build, end-to-end.

More Repositories

1

data-engineering-practice

Data Engineering Practice Problems
Dockerfile
1,634
star
2

dataEngineeringTemplate

Template for Data Engineering and Data Pipeline projects
Shell
101
star
3

tinytimmy

A simple and easy to use Data Quality (DQ) tool built with Python.
Python
45
star
4

sniffer

csv and flat-file sniffer built in Rust.
Rust
40
star
5

unitTestPySpark

how to unit test your PySpark code
Python
27
star
6

reepicheep

This is a `Rust` based package to help with the management of complex medicine (pill) management cycles.
Rust
25
star
7

lakescum

A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.
Python
21
star
8

PythonVsRustAWSLambda

Testing the runtime difference between Python and Rust for AWS Lambda.
Rust
12
star
9

GreatExpectationsWithDatabricks

Getting Great Expectations setup to run on DataBricks with Spark Dataframes.
Python
12
star
10

RustForDataPipelines

Testing out if Rust can be used for a normal Data Engineering Pipeline.
Rust
11
star
11

polarsVpandasOnAwsLambda

Using Polars and Pandas on AWS Lambda to process data.
Python
9
star
12

polars-DeltaLake

Trying out the Dataframe Polars library with Delta Lake ... feat Python.
Python
8
star
13

learnDataEngineering

Sample Project to Learn Data Engineering
Python
8
star
14

PolarsVsPySpark

can Polars crunch 27GBs of data faster than Pyspark?
Python
8
star
15

RustOnApacheAirflow

The ultimate Data Engineering Chadstack. Apache Airflow running Rust. Bring it.
Rust
7
star
16

DuckdbAndDeltaLake

Learning how to query remote s3 Delta Lake with DuckDB.
Python
7
star
17

DataFrameShowDown

Polars vs Spark vs Pandas vs DataFusion. Guess who wins?
Python
6
star
18

SysproInvoicing

use Python to Invoice in Syspro ERP System
Python
5
star
19

PandasVsPolars

Try some common functions between Pandas and Polars.
Python
4
star
20

GreatExpectationsWithSpark

Learning to setup a Great Expectations project using Apache Spark
HTML
4
star
21

fine-tune-openLLaMA

This repo shows how to fine tune openLLaMA (7b) model on a GPU.
HTML
4
star
22

rustAsyncExample

A quick example of using Rust to do async HTTP requests/downloads.
Rust
3
star
23

RustDataFusion

Trying out Rust's DataFusion, compare to Apache Spark.
Python
3
star
24

AirflowVsDagster

Comparing Apache Airflow to Dagster
Python
3
star
25

gRPCwithPython

Introduction to gRPC with Python.
Python
3
star
26

graphRS

Building a Network/Graph from scratch, and understanding it with Rust.
Rust
2
star
27

PolarsDateTimeManipulation

Polars date and time manipulation
Python
2
star
28

datafusion-sql-cli

Playing around and making ETL tools with Datafusion's CLI SQL tool.
Dockerfile
2
star
29

delta-rs-example-writer

Trying out the Rust delta-rs Delta Table writer.
Rust
2
star
30

puddleglum

Rust based package for answer questions about s3 buckets and files
Rust
2
star
31

DSAforTheRestOfUs

Introduction to DSA (Data Structures and Algorithms) with Rust.
Rust
1
star
32

DuckDBvsPolars

Comparing the performance of DuckDB to Polars
Python
1
star
33

learningGolang

Learning Golang by processing CSV files.
Go
1
star
34

kafkaClusterWithPython

create a 3 node Kafka cluster, interact with Python client.
Python
1
star
35

pyElasticsearch

interacting with Elasticsearch to store books.
Python
1
star
36

postgresInsertPerformance

Testing Postgres Insert Performance
Python
1
star
37

DataWarehouse_ForeignKeys

Add Foreign Keys in SQL Server to Hundreds+ Data Warehouse tables with Dynamic SQL
SQLPL
1
star
38

SparkHadoopCluster

create your own Apache Spark cluster with Hadoop/HDFS installed.
1
star
39

IowaCornYields

Iowa Corn Yields using Python, Pandas vs RDBMS
Python
1
star
40

DataEngineeringWithFortran

Trying to use Fortran to write a data pipeline
1
star
41

s3cloudStorage_Golang_Python_Rust

Golang, Rust, and Python working with s3 files.
Go
1
star
42

PrefectIntroduction

Trying out Prefect as compared to Airflow.
Python
1
star
43

airflow-kubernetes

Running Airflow inside Kubernetes
1
star
44

testApacheArrow

Trying out Apache Arrow, compare to Polars.
Python
1
star
45

pyarrow-v-duckdb-v-polars

Compare pyarrow to duckdb to polars for writing data pipelines.
Python
1
star
46

sparklepop

SparklePop is a simple Python package designed to check the free disk space of an AWS RDS instance.
Python
1
star
47

GolangVsRust

Writing Word Counter with both Golang and Rust
Go
1
star
48

sparkMachineLearningExample

An example of a Spark Machine Learning Pipeline in PySpark.
Python
1
star
49

solaSearch

Project to store, relate, and make for public use and consumption, various ancient texts.
Rust
1
star
50

TheBearVsTheDuck

Compare DuckDB v Polars for Data Pipelines.
Python
1
star
51

RayonWithRustVsPython

Trying on Rayon with Rust vs Python Thread and ProcessPools.
Rust
1
star
52

scrounger

A `Rust` based Python package as a faster alternative to `vulture` for seeking out and finding dead and unused code in Python repositories.
Rust
1
star
53

GolangDataFrames

playing with DataFrames in Golang, compare it to Python.
Go
1
star
54

pySparkSQLContext

Learning to use SQLContext with PySpark.
Python
1
star
55

sparkShufflePerformance

testing the performance of Spark shuffle configurations
Python
1
star