• Stars
    star
    3
  • Rank 3,963,521 (Top 79 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created 12 months ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

My solutions to the 2023 Hanukkah of Data

More Repositories

1

buenavista

A Postgres Proxy Server in Python
Python
230
star
2

exhibit

A prototype of Hive UDFs/UDTFs that execute nested SQL queries within rows.
Java
54
star
3

duckdbt

The Modern Data Stack in a Python package
Python
45
star
4

de4ml

Supporting materials/code examples for my course in data engineering for machine learning.
Python
37
star
5

avro-json

Utilities for converting to and from JSON from Avro records via Hadoop streaming or Hive.
Java
29
star
6

geojson

Scala library for working with GeoJSON records using Esri's Geometry API for Java
Scala
28
star
7

target-duckdb

A Singer.io target for DuckDB
Python
17
star
8

driskill

Either[Hotel in Austin, Prototype of a Scala Distributed Collections API]
Scala
13
star
9

nba_monte_carlo

The Modern Data Stack in a (Smaller) Box
Python
11
star
10

lineage

An R package for tracking the transformations applied to the vectors in a data frame.
R
9
star
11

supernova

A starter kit for working with supernova schemas.
9
star
12

mz-fastapi

A FastAPI utility for building HTTP endpoints powered by Materialize TAIL queries
Python
8
star
13

dbt-buenavista

The dbt adapter for a Buena Vista database proxy server
Python
6
star
14

hive-scd

A new kind of slowly changing dimension pattern for Apache Hive.
Java
6
star
15

crunch-demo

A demo application for getting started with Apache Crunch.
Java
4
star
16

dbt-mysql

MySQL plugin for dbt
Python
3
star
17

saferdd

Tools for working with dirty data in Apache Spark.
Scala
3
star
18

attribution

MapReduce job for creating multitouch attribution models.
Java
3
star
19

avroplay

Me messing around with some Avro stuff
Java
3
star
20

s3-demo

Demo dbt-duckdb against localstack w/the new fsspec config options in version 1.4.1
Dockerfile
3
star
21

cdh-mapreduce-ext

Classes in the new mapreduce.* API that are not part of CDH3 yet.
Java
2
star
22

avro-json-serde

A wrapper that uses the Hive AvroSerDe to deserialize data as JSON for use with Hive Streaming
Java
1
star
23

hosprunner

R
1
star