• Stars
    star
    1
  • Language
  • Created about 4 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Running Airflow inside Kubernetes

More Repositories

1

data-engineering-practice

Data Engineering Practice Problems
Dockerfile
1,634
star
2

dataEngineeringTemplate

Template for Data Engineering and Data Pipeline projects
Shell
101
star
3

tinytimmy

A simple and easy to use Data Quality (DQ) tool built with Python.
Python
45
star
4

sniffer

csv and flat-file sniffer built in Rust.
Rust
40
star
5

unitTestPySpark

how to unit test your PySpark code
Python
27
star
6

DataEngineeringProjects

Some example projects for Data Engineers to build, end-to-end.
26
star
7

reepicheep

This is a `Rust` based package to help with the management of complex medicine (pill) management cycles.
Rust
25
star
8

lakescum

A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.
Python
21
star
9

PythonVsRustAWSLambda

Testing the runtime difference between Python and Rust for AWS Lambda.
Rust
12
star
10

GreatExpectationsWithDatabricks

Getting Great Expectations setup to run on DataBricks with Spark Dataframes.
Python
12
star
11

RustForDataPipelines

Testing out if Rust can be used for a normal Data Engineering Pipeline.
Rust
11
star
12

polarsVpandasOnAwsLambda

Using Polars and Pandas on AWS Lambda to process data.
Python
9
star
13

polars-DeltaLake

Trying out the Dataframe Polars library with Delta Lake ... feat Python.
Python
8
star
14

learnDataEngineering

Sample Project to Learn Data Engineering
Python
8
star
15

PolarsVsPySpark

can Polars crunch 27GBs of data faster than Pyspark?
Python
8
star
16

RustOnApacheAirflow

The ultimate Data Engineering Chadstack. Apache Airflow running Rust. Bring it.
Rust
7
star
17

DuckdbAndDeltaLake

Learning how to query remote s3 Delta Lake with DuckDB.
Python
7
star
18

DataFrameShowDown

Polars vs Spark vs Pandas vs DataFusion. Guess who wins?
Python
6
star
19

SysproInvoicing

use Python to Invoice in Syspro ERP System
Python
5
star
20

PandasVsPolars

Try some common functions between Pandas and Polars.
Python
4
star
21

GreatExpectationsWithSpark

Learning to setup a Great Expectations project using Apache Spark
HTML
4
star
22

fine-tune-openLLaMA

This repo shows how to fine tune openLLaMA (7b) model on a GPU.
HTML
4
star
23

rustAsyncExample

A quick example of using Rust to do async HTTP requests/downloads.
Rust
3
star
24

RustDataFusion

Trying out Rust's DataFusion, compare to Apache Spark.
Python
3
star
25

AirflowVsDagster

Comparing Apache Airflow to Dagster
Python
3
star
26

gRPCwithPython

Introduction to gRPC with Python.
Python
3
star
27

graphRS

Building a Network/Graph from scratch, and understanding it with Rust.
Rust
2
star
28

PolarsDateTimeManipulation

Polars date and time manipulation
Python
2
star
29

datafusion-sql-cli

Playing around and making ETL tools with Datafusion's CLI SQL tool.
Dockerfile
2
star
30

delta-rs-example-writer

Trying out the Rust delta-rs Delta Table writer.
Rust
2
star
31

puddleglum

Rust based package for answer questions about s3 buckets and files
Rust
2
star
32

DSAforTheRestOfUs

Introduction to DSA (Data Structures and Algorithms) with Rust.
Rust
1
star
33

DuckDBvsPolars

Comparing the performance of DuckDB to Polars
Python
1
star
34

learningGolang

Learning Golang by processing CSV files.
Go
1
star
35

kafkaClusterWithPython

create a 3 node Kafka cluster, interact with Python client.
Python
1
star
36

pyElasticsearch

interacting with Elasticsearch to store books.
Python
1
star
37

postgresInsertPerformance

Testing Postgres Insert Performance
Python
1
star
38

DataWarehouse_ForeignKeys

Add Foreign Keys in SQL Server to Hundreds+ Data Warehouse tables with Dynamic SQL
SQLPL
1
star
39

SparkHadoopCluster

create your own Apache Spark cluster with Hadoop/HDFS installed.
1
star
40

IowaCornYields

Iowa Corn Yields using Python, Pandas vs RDBMS
Python
1
star
41

DataEngineeringWithFortran

Trying to use Fortran to write a data pipeline
1
star
42

s3cloudStorage_Golang_Python_Rust

Golang, Rust, and Python working with s3 files.
Go
1
star
43

PrefectIntroduction

Trying out Prefect as compared to Airflow.
Python
1
star
44

testApacheArrow

Trying out Apache Arrow, compare to Polars.
Python
1
star
45

pyarrow-v-duckdb-v-polars

Compare pyarrow to duckdb to polars for writing data pipelines.
Python
1
star
46

sparklepop

SparklePop is a simple Python package designed to check the free disk space of an AWS RDS instance.
Python
1
star
47

GolangVsRust

Writing Word Counter with both Golang and Rust
Go
1
star
48

sparkMachineLearningExample

An example of a Spark Machine Learning Pipeline in PySpark.
Python
1
star
49

solaSearch

Project to store, relate, and make for public use and consumption, various ancient texts.
Rust
1
star
50

TheBearVsTheDuck

Compare DuckDB v Polars for Data Pipelines.
Python
1
star
51

RayonWithRustVsPython

Trying on Rayon with Rust vs Python Thread and ProcessPools.
Rust
1
star
52

scrounger

A `Rust` based Python package as a faster alternative to `vulture` for seeking out and finding dead and unused code in Python repositories.
Rust
1
star
53

GolangDataFrames

playing with DataFrames in Golang, compare it to Python.
Go
1
star
54

pySparkSQLContext

Learning to use SQLContext with PySpark.
Python
1
star
55

sparkShufflePerformance

testing the performance of Spark shuffle configurations
Python
1
star