Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Zig

Java

Swift

Ruby

Nix

MATLAB

C#

Elixir

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Swift

Julia

C

Zig

JavaScript

Crystal

F#

Groovy

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇼🇫 Wallis and Futuna

🇬🇵 Guadeloupe

🇬🇮 Gibraltar

🇭🇺 Hungary

🇭🇹 Haiti

🇧🇮 Burundi

🇪🇬 Egypt

🇲🇭 Marshall Islands

All Countries Compare Countries

dimroc/etl-language-comparison

This repository has been archived on 18/Oct/2019
Stars
188
Rank 205,563 (Top 5 %)
Language
Erlang
Created over 10 years ago
Updated over 6 years ago

dimroc/etl-language-comparison

dimroc

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Count the number of times certain words were said in a particular neighborhood. Performed as a basic MapReduce job against 25M tweets. Implemented with different programming languages as a educational exercise.

Update

Please see the following blog posts for the latests updates:

ETL Language Showdown - Sept. 2014
ETL Language Showdown Part 2 - Now with Python - May. 2015
ETL Language Showdown Part 3 - 10 Languages and growing - Nov. 2015

Wins

Analyses and discussions done here have led to the following language pull requests:

ETL Language Showdown

This repo implements the same map reduce ETL (Extract-Transform-Load) task in multiple languages in an effort to compare language productivity, terseness and readability. The performance comparisons should not be taken seriously. If anything, it is a bigger indication of my skillset in that language rather than their performance capabilities.

The Task

Count the number of tweets that mention 'knicks' in their message and bucket based on the neighborhood of origin. The ~1GB dataset for this task, sampled below, contains a tweet's message and its NYC neighborhood.

Simply run fetch_tweets in the repo directory or downloaded here.

91	west-brighton	Brooklyn	Uhhh
121	turtle-bay-east-midtown	Manhattan	Say anything
175	morningside-heights	Manhattan	It feels half-cheating half-fulfilling to cite myself.

Initial Assumption

These tasks are not run on Hadoop but do run concurrently. Performance numbers are moot since the CPU mostly sits idle waiting on Disk IO.
**UPDATE: Boy was the IO bound assumption wrong.

The Languages

Below you will find the languages run. Note that frameworks also play a big role, for example the Scala implementation compares the parallel collection to futures and the Akka framework. Click through on each language to read more.

Language	Owner
Ruby
Golang	matttproud
Scala
Nim
Node
PHP
Erlang
Elixir	josevalim
Rust
Python
C#	mganss
shell	mganss
perl	sitaramc

count

An experiment with crowd counting. Trains a keras model in python for use with Rails and iOS CoreML.

iOS.ProjectMonitor

iPhone application that monitors the status of continuous integration builds

brunch-with-rails

Minimal brunch setup that builds its output into Rails' app/assets/(javascripts|stylesheets). Modeled after the Phoenix Framework's use of Brunch.

urbanevents

Tool to record and search tweeted media across cities

new_tweet_city

No longer hosted, reached end of life. Visualize all new york city tweets in real-time, flashing the originating neighborhood.

brunch-with-rails-demo

Sample Rails application that uses brunch-with-rails for a client side asset pipeline

mhi-iLab

collabcode

Collaborate with code! An experiment with coffee-script and the zappa framework to build a site that leverages the Ace editor to allow easier remote pair programming. Has a few bugs and is built on now outdated alpha frameworks ;/

ChainlinkMobile

A hackathon experiment to run chainlink on iOS. Uses chainlink branch @ https://github.com/dimroc/chainlink/tree/hackathon/gomobile

nyc_building_perimeters

Renders NYC neighborhoods and associated building perimeters in WebGL. Uses information from NYC OpenData.

manhattan_forum

(Decommissioned) An activity feed for your borough in NYC. Pictures and movies are tagged by their neighborhoods and posted to their borough's activity feed. Imagine a bootleg version of Instagram where you follow a borough and not a person.

ilab-nodejs

bdd-demo-finish

external_id

Generates an external id to mask the counts of records inherent with auto-increment ids.

geth-blocks-unsubscribe

Demonstration showing go-ethereum rpc client blocking indefinitely when calling Unsubscribe

nbc

revealdown

Web app allowing one to create reveal.js presentations using showdown

nyc_shapefile_to_threejs

Render New York City as a Three JS mesh converted from the Bytes of the big apple NYC shapefiles.

blog

Development blog for Dimitri Roche

beyonceorrihanna

ga-ruby-on-rails-for-devs

apg

automatic password generator. Recreated the old brew install apg in go but with base58 as default

machine_learning_hoods

http://blog.dimroc.com/2016/01/13/machine-learning-neighborhoods/ Run datasets through AWS Machine Learning to train a model that can tell what neighborhood a comment belongs to