jldbc/coffee-quality-database

Stars
226
Rank 176,514 (Top 4 %)
Language
R
License
MIT License
Created almost 7 years ago
Updated over 6 years ago

jldbc/coffee-quality-database

jldbc

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Building the Coffee Quality Institute Database

coffee-quality-database

Digitizing 1,340 coffee reviews

Data

These data contain reviews of 1312 arabica and 28 robusta coffee beans from the Coffee Quality Institute's trained reviewers. The features include:

Quality Measures

Aroma
Flavor
Aftertaste
Acidity
Body
Balance
Uniformity
Cup Cleanliness
Sweetness
Moisture
Defects

Bean Metadata

Processing Method
Color
Species (arabica / robusta)

Farm Metadata

Owner
Country of Origin
Farm Name
Lot Number
Mill
Company
Altitude
Region

The data folder contains both raw and cleaned data. The raw data is exactly as it was found on the CQI site. Since these human-recorded data use a variety of different encodings, abbreviations, and units of measurement for their farm names, altitude, region, and other fields, I recommend using the cleaned data as a starting point.

The site was scraped using a Selenium headless browser and Beautiful Soup. To replicate this or collect updated data, create a login for the CQI site and enter your credentials in the scraper

Source

These data were collected from the Coffee Quality Institute's review pages in January 2018.

pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)

bandits

Multi-Armed Bandit algorithms applied to the MovieLens 20M dataset

Tensorflow_ML_Algorithms

Implementations of machine learning algorithms in Tensorflow: MLP, RNN, autoencoder, PageRank, KNN, K-Means, logistic regression, and OLS regression

gutenberg

A content-based recommender system for books using the Project Gutenberg text corpus

numpy_neural_net

A simple neural network (multilayer perceptron) with backpropagation implemented in Python with NumPy

gunsandcrime

A replication of Marvell and Moody's economics experiment measuring impact of gun ownership on crime rates, using percent suicides by gun, gun manufacturing, and survey data as proxies for gun ownership. Data set included.

field-goal-models

Modeling NFL Field Goal Probabilities in R

twitter-social-graph

Project to visualize a user's Twitter social graph

Sports-Econometrics

Analytics Projects from Sports Econometrics (EC3700) -- a course on advanced methods in cross-sectional econometrics with a focus on sports data

AuctionHouse

See how much advertisers are paying for your attention https://chrome.google.com/webstore/detail/auctionhouse/hmjofiljabjmompfgllkpkbkfbpbpkcp

iPython-Notebooks

A collection of small side projects and analyses

Jupyter Notebook

malicious-urls

Malicious url classifier build with SVM, random forest, and logistic regression classifiers

Jupyter Notebook

Saber

Misc. sabermetric and sports analytics projects

Jupyter Notebook

boston_college_webcams

Pull photos from the Boston College webcams

Statistical-Learning

Coursework from Big Data (EC3389) -- a course on statistical learning theory with applications in Python

Jupyter Notebook

Big-Data

Coursework from Big Data (CS3390) -- Machine Learning tasks performed using Hadoop, MapReduce, and Spark

Udacity-ML

Coursework from Udacity Machine Learning Engineer nanodegree

groupme-analytics

Data Mine your Group Chat

Jupyter Notebook

NewsBot

Computer generated news headlines using Markov chains