Testing Benford's Law
This is a simple experiment by Jason Long and Bryce Thornton to test how many real-life, publicly available datasets satisfy Benford's Law.
Contributing Datasets
If you find this to be an interesting idea, we'd encourage you to help add more datasets to the site. We've intentionally kept the site as simple and lightweight as possible. There is no real backend - the data has been crunched in advance and the results are simply entered into JSON files.
To contribute a new dataset, you'll need to do two things:
Add the dataset name to the JSON index file
The format of js/datasets/index.json
is simply a key/value pair:
{
"twitter-users-by-followers-count": "Twitter users by followers count",
"distance-of-stars-from-earth-in-light-years": "Distance of stars from Earth in light years",
"loan-amounts-on-kiva-org": "Loan amounts on kiva.org",
"total-number-of-print-materials-in-us-libraries": "Total number of print materials in US libraries",
"population-of-spanish-cities": "Population of Spanish cities"
}
Create a dataset JSON file
Add your new file in the /js/datasets/
directory with a name that matches the key used in step 1. The format looks like this:
{
"values": {
"1": 32.62,
"2": 16.66,
"3": 11.80,
"4": 9.26,
"5": 7.63,
"6": 6.55,
"7": 5.76,
"8": 5.14,
"9": 4.56
},
"num_records": "38,670,514",
"min_value": "1",
"max_value": "4,706,631",
"source": "http://www.infochimps.com/datasets/twitter-census-twitter-users-by-friends-count"
}
It's important to include the source of the data used so that others can verify and reproduce the results.
Crunching the data
Generating Benford stats is a fairly straightforward process. We've made a simple ruby class for you to use if you'd like.
First, grab a copy of the class from here: https://gist.github.com/1044174
Second, include the class in your script like so:
require 'benford_counter'
require 'rubygems'
require 'csv'
counter = BenfordCounter.new
CSV.foreach("spain.txt") do |row|
counter.count(row[9])
end
counter.results
Additional Tools
fweez contributed the Linux filesize dataset and created a Python script for tallying filesizes in a directory.
Updating Javascript and CSS
We're using CoffeeScript for the Javascript and Sass/Compass for the CSS.
Once CoffeeScript is installed (see the CoffeeScript docs), run this command from the project root to observe and compile changes:
coffee --watch -o js/ --compile js/coffee/*.coffee
Note that the only file that should be edited is /js/coffee/app.coffee
. The /js/coffee/app.js
file is generated by CoffeeScript.
To make changes to the CSS, you need to install Sass and Compass (see the Compass docs. Then edit /css/sass/screen.scss
. You observe and compile changes by running this command from the project root:
compass watch
To compile a production-ready compressed version:
compass compile --output-style compressed --force