• Stars
    star
    226
  • Rank 175,506 (Top 4 %)
  • Language
    R
  • License
    MIT License
  • Created over 6 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Building the Coffee Quality Institute Database

coffee-quality-database

Digitizing 1,340 coffee reviews

Data

These data contain reviews of 1312 arabica and 28 robusta coffee beans from the Coffee Quality Institute's trained reviewers. The features include:

Quality Measures

  • Aroma
  • Flavor
  • Aftertaste
  • Acidity
  • Body
  • Balance
  • Uniformity
  • Cup Cleanliness
  • Sweetness
  • Moisture
  • Defects

Bean Metadata

  • Processing Method
  • Color
  • Species (arabica / robusta)

Farm Metadata

  • Owner
  • Country of Origin
  • Farm Name
  • Lot Number
  • Mill
  • Company
  • Altitude
  • Region

The data folder contains both raw and cleaned data. The raw data is exactly as it was found on the CQI site. Since these human-recorded data use a variety of different encodings, abbreviations, and units of measurement for their farm names, altitude, region, and other fields, I recommend using the cleaned data as a starting point.

The site was scraped using a Selenium headless browser and Beautiful Soup. To replicate this or collect updated data, create a login for the CQI site and enter your credentials in the scraper

Source

These data were collected from the Coffee Quality Institute's review pages in January 2018.

More Repositories

1

pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
Python
1,208
star
2

bandits

Multi-Armed Bandit algorithms applied to the MovieLens 20M dataset
Python
52
star
3

Tensorflow_ML_Algorithms

Implementations of machine learning algorithms in Tensorflow: MLP, RNN, autoencoder, PageRank, KNN, K-Means, logistic regression, and OLS regression
Python
52
star
4

gutenberg

A content-based recommender system for books using the Project Gutenberg text corpus
Python
28
star
5

numpy_neural_net

A simple neural network (multilayer perceptron) with backpropagation implemented in Python with NumPy
Python
27
star
6

gunsandcrime

A replication of Marvell and Moody's economics experiment measuring impact of gun ownership on crime rates, using percent suicides by gun, gun manufacturing, and survey data as proxies for gun ownership. Data set included.
Stata
10
star
7

field-goal-models

Modeling NFL Field Goal Probabilities in R
R
9
star
8

twitter-social-graph

Project to visualize a user's Twitter social graph
Python
5
star
9

Sports-Econometrics

Analytics Projects from Sports Econometrics (EC3700) -- a course on advanced methods in cross-sectional econometrics with a focus on sports data
Stata
5
star
10

AuctionHouse

See how much advertisers are paying for your attention https://chrome.google.com/webstore/detail/auctionhouse/hmjofiljabjmompfgllkpkbkfbpbpkcp
JavaScript
5
star
11

iPython-Notebooks

A collection of small side projects and analyses
Jupyter Notebook
4
star
12

malicious-urls

Malicious url classifier build with SVM, random forest, and logistic regression classifiers
Jupyter Notebook
3
star
13

Saber

Misc. sabermetric and sports analytics projects
Jupyter Notebook
3
star
14

boston_college_webcams

Pull photos from the Boston College webcams
Python
2
star
15

Statistical-Learning

Coursework from Big Data (EC3389) -- a course on statistical learning theory with applications in Python
Jupyter Notebook
2
star
16

Big-Data

Coursework from Big Data (CS3390) -- Machine Learning tasks performed using Hadoop, MapReduce, and Spark
Python
2
star
17

Udacity-ML

Coursework from Udacity Machine Learning Engineer nanodegree
HTML
1
star
18

groupme-analytics

Data Mine your Group Chat
Jupyter Notebook
1
star
19

NewsBot

Computer generated news headlines using Markov chains
Python
1
star