• Stars
    star
    29
  • Rank 855,567 (Top 17 %)
  • Language
    Java
  • Created over 11 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Utilities for converting to and from JSON from Avro records via Hadoop streaming or Hive.

More Repositories

1

buenavista

A Postgres Proxy Server in Python
Python
199
star
2

exhibit

A prototype of Hive UDFs/UDTFs that execute nested SQL queries within rows.
Java
54
star
3

duckdbt

The Modern Data Stack in a Python package
Python
43
star
4

de4ml

Supporting materials/code examples for my course in data engineering for machine learning.
Python
38
star
5

geojson

Scala library for working with GeoJSON records using Esri's Geometry API for Java
Scala
28
star
6

target-duckdb

A Singer.io target for DuckDB
Python
17
star
7

driskill

Either[Hotel in Austin, Prototype of a Scala Distributed Collections API]
Scala
13
star
8

nba_monte_carlo

The Modern Data Stack in a (Smaller) Box
Python
11
star
9

lineage

An R package for tracking the transformations applied to the vectors in a data frame.
R
9
star
10

supernova

A starter kit for working with supernova schemas.
9
star
11

mz-fastapi

A FastAPI utility for building HTTP endpoints powered by Materialize TAIL queries
Python
8
star
12

hive-scd

A new kind of slowly changing dimension pattern for Apache Hive.
Java
6
star
13

dbt-buenavista

The dbt adapter for a Buena Vista database proxy server
Python
5
star
14

crunch-demo

A demo application for getting started with Apache Crunch.
Java
4
star
15

dbt-mysql

MySQL plugin for dbt
Python
3
star
16

attribution

MapReduce job for creating multitouch attribution models.
Java
3
star
17

saferdd

Tools for working with dirty data in Apache Spark.
Scala
3
star
18

avroplay

Me messing around with some Avro stuff
Java
3
star
19

s3-demo

Demo dbt-duckdb against localstack w/the new fsspec config options in version 1.4.1
Dockerfile
3
star
20

hanukkahofdata

My solutions to the 2023 Hanukkah of Data
Python
3
star
21

cdh-mapreduce-ext

Classes in the new mapreduce.* API that are not part of CDH3 yet.
Java
2
star
22

avro-json-serde

A wrapper that uses the Hive AvroSerDe to deserialize data as JSON for use with Hive Streaming
Java
1
star
23

hosprunner

R
1
star