• Stars
    star
    37
  • Rank 720,807 (Top 15 %)
  • Language
    Python
  • Created about 2 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Supporting materials/code examples for my course in data engineering for machine learning.

More Repositories

1

buenavista

A Postgres Proxy Server in Python
Python
230
star
2

exhibit

A prototype of Hive UDFs/UDTFs that execute nested SQL queries within rows.
Java
54
star
3

duckdbt

The Modern Data Stack in a Python package
Python
45
star
4

avro-json

Utilities for converting to and from JSON from Avro records via Hadoop streaming or Hive.
Java
29
star
5

geojson

Scala library for working with GeoJSON records using Esri's Geometry API for Java
Scala
28
star
6

target-duckdb

A Singer.io target for DuckDB
Python
17
star
7

driskill

Either[Hotel in Austin, Prototype of a Scala Distributed Collections API]
Scala
13
star
8

nba_monte_carlo

The Modern Data Stack in a (Smaller) Box
Python
11
star
9

lineage

An R package for tracking the transformations applied to the vectors in a data frame.
R
9
star
10

supernova

A starter kit for working with supernova schemas.
9
star
11

mz-fastapi

A FastAPI utility for building HTTP endpoints powered by Materialize TAIL queries
Python
8
star
12

dbt-buenavista

The dbt adapter for a Buena Vista database proxy server
Python
6
star
13

hive-scd

A new kind of slowly changing dimension pattern for Apache Hive.
Java
6
star
14

crunch-demo

A demo application for getting started with Apache Crunch.
Java
4
star
15

dbt-mysql

MySQL plugin for dbt
Python
3
star
16

saferdd

Tools for working with dirty data in Apache Spark.
Scala
3
star
17

attribution

MapReduce job for creating multitouch attribution models.
Java
3
star
18

avroplay

Me messing around with some Avro stuff
Java
3
star
19

s3-demo

Demo dbt-duckdb against localstack w/the new fsspec config options in version 1.4.1
Dockerfile
3
star
20

hanukkahofdata

My solutions to the 2023 Hanukkah of Data
Python
3
star
21

cdh-mapreduce-ext

Classes in the new mapreduce.* API that are not part of CDH3 yet.
Java
2
star
22

avro-json-serde

A wrapper that uses the Hive AvroSerDe to deserialize data as JSON for use with Hive Streaming
Java
1
star
23

hosprunner

R
1
star