There are no reviews yet. Be the first to send feedback to the community and the maintainers!
data-engineering-practice
Data Engineering Practice ProblemsdataEngineeringTemplate
Template for Data Engineering and Data Pipeline projectstinytimmy
A simple and easy to use Data Quality (DQ) tool built with Python.sniffer
csv and flat-file sniffer built in Rust.unitTestPySpark
how to unit test your PySpark codeDataEngineeringProjects
Some example projects for Data Engineers to build, end-to-end.reepicheep
This is a `Rust` based package to help with the management of complex medicine (pill) management cycles.lakescum
A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.PythonVsRustAWSLambda
Testing the runtime difference between Python and Rust for AWS Lambda.GreatExpectationsWithDatabricks
Getting Great Expectations setup to run on DataBricks with Spark Dataframes.RustForDataPipelines
Testing out if Rust can be used for a normal Data Engineering Pipeline.polarsVpandasOnAwsLambda
Using Polars and Pandas on AWS Lambda to process data.polars-DeltaLake
Trying out the Dataframe Polars library with Delta Lake ... feat Python.learnDataEngineering
Sample Project to Learn Data EngineeringPolarsVsPySpark
can Polars crunch 27GBs of data faster than Pyspark?RustOnApacheAirflow
The ultimate Data Engineering Chadstack. Apache Airflow running Rust. Bring it.DuckdbAndDeltaLake
Learning how to query remote s3 Delta Lake with DuckDB.DataFrameShowDown
Polars vs Spark vs Pandas vs DataFusion. Guess who wins?SysproInvoicing
use Python to Invoice in Syspro ERP SystemPandasVsPolars
Try some common functions between Pandas and Polars.GreatExpectationsWithSpark
Learning to setup a Great Expectations project using Apache Sparkfine-tune-openLLaMA
This repo shows how to fine tune openLLaMA (7b) model on a GPU.rustAsyncExample
A quick example of using Rust to do async HTTP requests/downloads.AirflowVsDagster
Comparing Apache Airflow to DagstergRPCwithPython
Introduction to gRPC with Python.graphRS
Building a Network/Graph from scratch, and understanding it with Rust.PolarsDateTimeManipulation
Polars date and time manipulationdatafusion-sql-cli
Playing around and making ETL tools with Datafusion's CLI SQL tool.delta-rs-example-writer
Trying out the Rust delta-rs Delta Table writer.puddleglum
Rust based package for answer questions about s3 buckets and filesDSAforTheRestOfUs
Introduction to DSA (Data Structures and Algorithms) with Rust.DuckDBvsPolars
Comparing the performance of DuckDB to PolarslearningGolang
Learning Golang by processing CSV files.kafkaClusterWithPython
create a 3 node Kafka cluster, interact with Python client.pyElasticsearch
interacting with Elasticsearch to store books.postgresInsertPerformance
Testing Postgres Insert PerformanceDataWarehouse_ForeignKeys
Add Foreign Keys in SQL Server to Hundreds+ Data Warehouse tables with Dynamic SQLSparkHadoopCluster
create your own Apache Spark cluster with Hadoop/HDFS installed.IowaCornYields
Iowa Corn Yields using Python, Pandas vs RDBMSDataEngineeringWithFortran
Trying to use Fortran to write a data pipelines3cloudStorage_Golang_Python_Rust
Golang, Rust, and Python working with s3 files.PrefectIntroduction
Trying out Prefect as compared to Airflow.airflow-kubernetes
Running Airflow inside KubernetestestApacheArrow
Trying out Apache Arrow, compare to Polars.pyarrow-v-duckdb-v-polars
Compare pyarrow to duckdb to polars for writing data pipelines.sparklepop
SparklePop is a simple Python package designed to check the free disk space of an AWS RDS instance.GolangVsRust
Writing Word Counter with both Golang and RustsparkMachineLearningExample
An example of a Spark Machine Learning Pipeline in PySpark.solaSearch
Project to store, relate, and make for public use and consumption, various ancient texts.TheBearVsTheDuck
Compare DuckDB v Polars for Data Pipelines.RayonWithRustVsPython
Trying on Rayon with Rust vs Python Thread and ProcessPools.scrounger
A `Rust` based Python package as a faster alternative to `vulture` for seeking out and finding dead and unused code in Python repositories.GolangDataFrames
playing with DataFrames in Golang, compare it to Python.pySparkSQLContext
Learning to use SQLContext with PySpark.sparkShufflePerformance
testing the performance of Spark shuffle configurationsLove Open Source and this site? Check out how you can help us