Josh Wills (@jwills)
  • Stars
    star
    567
  • Global Rank 51,790 (Top 2 %)
  • Followers 416
  • Following 26
  • Registered almost 14 years ago
  • Most used languages
    Python
    39.1 %
    Java
    34.8 %
    Scala
    13.0 %
    R
    8.7 %
    Dockerfile
    4.3 %
  • Location ๐Ÿ‡บ๐Ÿ‡ธ United States
  • Country Total Rank 16,591
  • Country Ranking
    Scala
    493
    Dockerfile
    2,305
    Java
    2,319
    R
    2,457
    Python
    3,433

Top repositories

1

buenavista

A Postgres Proxy Server in Python
Python
230
star
2

exhibit

A prototype of Hive UDFs/UDTFs that execute nested SQL queries within rows.
Java
54
star
3

duckdbt

The Modern Data Stack in a Python package
Python
45
star
4

de4ml

Supporting materials/code examples for my course in data engineering for machine learning.
Python
37
star
5

avro-json

Utilities for converting to and from JSON from Avro records via Hadoop streaming or Hive.
Java
29
star
6

geojson

Scala library for working with GeoJSON records using Esri's Geometry API for Java
Scala
28
star
7

target-duckdb

A Singer.io target for DuckDB
Python
17
star
8

driskill

Either[Hotel in Austin, Prototype of a Scala Distributed Collections API]
Scala
13
star
9

nba_monte_carlo

The Modern Data Stack in a (Smaller) Box
Python
11
star
10

lineage

An R package for tracking the transformations applied to the vectors in a data frame.
R
9
star
11

supernova

A starter kit for working with supernova schemas.
9
star
12

mz-fastapi

A FastAPI utility for building HTTP endpoints powered by Materialize TAIL queries
Python
8
star
13

dbt-buenavista

The dbt adapter for a Buena Vista database proxy server
Python
6
star
14

hive-scd

A new kind of slowly changing dimension pattern for Apache Hive.
Java
6
star
15

crunch-demo

A demo application for getting started with Apache Crunch.
Java
4
star
16

dbt-mysql

MySQL plugin for dbt
Python
3
star
17

saferdd

Tools for working with dirty data in Apache Spark.
Scala
3
star
18

attribution

MapReduce job for creating multitouch attribution models.
Java
3
star
19

avroplay

Me messing around with some Avro stuff
Java
3
star
20

s3-demo

Demo dbt-duckdb against localstack w/the new fsspec config options in version 1.4.1
Dockerfile
3
star
21

hanukkahofdata

My solutions to the 2023 Hanukkah of Data
Python
3
star
22

cdh-mapreduce-ext

Classes in the new mapreduce.* API that are not part of CDH3 yet.
Java
2
star
23

avro-json-serde

A wrapper that uses the Hive AvroSerDe to deserialize data as JSON for use with Hive Streaming
Java
1
star
24

hosprunner

R
1
star