• Stars
    star
    45
  • Rank 624,037 (Top 13 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A simple and easy to use Data Quality (DQ) tool built with Python.

More Repositories

1

data-engineering-practice

Data Engineering Practice Problems
Dockerfile
1,634
star
2

dataEngineeringTemplate

Template for Data Engineering and Data Pipeline projects
Shell
101
star
3

sniffer

csv and flat-file sniffer built in Rust.
Rust
40
star
4

unitTestPySpark

how to unit test your PySpark code
Python
27
star
5

DataEngineeringProjects

Some example projects for Data Engineers to build, end-to-end.
26
star
6

reepicheep

This is a `Rust` based package to help with the management of complex medicine (pill) management cycles.
Rust
25
star
7

lakescum

A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.
Python
21
star
8

PythonVsRustAWSLambda

Testing the runtime difference between Python and Rust for AWS Lambda.
Rust
12
star
9

GreatExpectationsWithDatabricks

Getting Great Expectations setup to run on DataBricks with Spark Dataframes.
Python
12
star
10

RustForDataPipelines

Testing out if Rust can be used for a normal Data Engineering Pipeline.
Rust
11
star
11

polarsVpandasOnAwsLambda

Using Polars and Pandas on AWS Lambda to process data.
Python
9
star
12

polars-DeltaLake

Trying out the Dataframe Polars library with Delta Lake ... feat Python.
Python
8
star
13

learnDataEngineering

Sample Project to Learn Data Engineering
Python
8
star
14

PolarsVsPySpark

can Polars crunch 27GBs of data faster than Pyspark?
Python
8
star
15

RustOnApacheAirflow

The ultimate Data Engineering Chadstack. Apache Airflow running Rust. Bring it.
Rust
7
star
16

DuckdbAndDeltaLake

Learning how to query remote s3 Delta Lake with DuckDB.
Python
7
star
17

DataFrameShowDown

Polars vs Spark vs Pandas vs DataFusion. Guess who wins?
Python
6
star
18

SysproInvoicing

use Python to Invoice in Syspro ERP System
Python
5
star
19

PandasVsPolars

Try some common functions between Pandas and Polars.
Python
4
star
20

GreatExpectationsWithSpark

Learning to setup a Great Expectations project using Apache Spark
HTML
4
star
21

fine-tune-openLLaMA

This repo shows how to fine tune openLLaMA (7b) model on a GPU.
HTML
4
star
22

rustAsyncExample

A quick example of using Rust to do async HTTP requests/downloads.
Rust
3
star
23

RustDataFusion

Trying out Rust's DataFusion, compare to Apache Spark.
Python
3
star
24

AirflowVsDagster

Comparing Apache Airflow to Dagster
Python
3
star
25

gRPCwithPython

Introduction to gRPC with Python.
Python
3
star
26

graphRS

Building a Network/Graph from scratch, and understanding it with Rust.
Rust
2
star
27

PolarsDateTimeManipulation

Polars date and time manipulation
Python
2
star
28

datafusion-sql-cli

Playing around and making ETL tools with Datafusion's CLI SQL tool.
Dockerfile
2
star
29

delta-rs-example-writer

Trying out the Rust delta-rs Delta Table writer.
Rust
2
star
30

puddleglum

Rust based package for answer questions about s3 buckets and files
Rust
2
star
31

DSAforTheRestOfUs

Introduction to DSA (Data Structures and Algorithms) with Rust.
Rust
1
star
32

DuckDBvsPolars

Comparing the performance of DuckDB to Polars
Python
1
star
33

learningGolang

Learning Golang by processing CSV files.
Go
1
star
34

kafkaClusterWithPython

create a 3 node Kafka cluster, interact with Python client.
Python
1
star
35

pyElasticsearch

interacting with Elasticsearch to store books.
Python
1
star
36

postgresInsertPerformance

Testing Postgres Insert Performance
Python
1
star
37

DataWarehouse_ForeignKeys

Add Foreign Keys in SQL Server to Hundreds+ Data Warehouse tables with Dynamic SQL
SQLPL
1
star
38

SparkHadoopCluster

create your own Apache Spark cluster with Hadoop/HDFS installed.
1
star
39

IowaCornYields

Iowa Corn Yields using Python, Pandas vs RDBMS
Python
1
star
40

DataEngineeringWithFortran

Trying to use Fortran to write a data pipeline
1
star
41

s3cloudStorage_Golang_Python_Rust

Golang, Rust, and Python working with s3 files.
Go
1
star
42

PrefectIntroduction

Trying out Prefect as compared to Airflow.
Python
1
star
43

airflow-kubernetes

Running Airflow inside Kubernetes
1
star
44

testApacheArrow

Trying out Apache Arrow, compare to Polars.
Python
1
star
45

pyarrow-v-duckdb-v-polars

Compare pyarrow to duckdb to polars for writing data pipelines.
Python
1
star
46

sparklepop

SparklePop is a simple Python package designed to check the free disk space of an AWS RDS instance.
Python
1
star
47

GolangVsRust

Writing Word Counter with both Golang and Rust
Go
1
star
48

sparkMachineLearningExample

An example of a Spark Machine Learning Pipeline in PySpark.
Python
1
star
49

solaSearch

Project to store, relate, and make for public use and consumption, various ancient texts.
Rust
1
star
50

TheBearVsTheDuck

Compare DuckDB v Polars for Data Pipelines.
Python
1
star
51

RayonWithRustVsPython

Trying on Rayon with Rust vs Python Thread and ProcessPools.
Rust
1
star
52

scrounger

A `Rust` based Python package as a faster alternative to `vulture` for seeking out and finding dead and unused code in Python repositories.
Rust
1
star
53

GolangDataFrames

playing with DataFrames in Golang, compare it to Python.
Go
1
star
54

pySparkSQLContext

Learning to use SQLContext with PySpark.
Python
1
star
55

sparkShufflePerformance

testing the performance of Spark shuffle configurations
Python
1
star