RIP Simpsons World
The Simpsons have moved from Simpsons World to Disney+, so the code in this repo no longer works
The Simpsons by the Data
Code in support of this post: The Simpsons by the Data
It's a Rails app, but isn't intended to be run as a server. It processes data from Simpsons World, Wikipedia, and IMDb, and populates a PostgreSQL database called simpsons_development
. The database contains 4 primary tables: episodes
, script_lines
, characters
, and locations
Instructions
Assumes you have Ruby and PostgreSQL installed
git clone [email protected]:toddwschneider/flim-springfield.git
cd flim-springfield/
createdb simpsons_development
bundle exec rake db:migrate
bundle exec rake import_data
bundle exec rake jobs:work
It takes about 45 minutes to process everything with one worker
Analysis
R code to analyze the data lives in the analysis/
folder
Caveats/areas for improvement
- I deduped some character names when they're printed in different ways, e.g. "TROY" is the same as "Troy McClure", but I certainly did not dedupe all 6000+ characters that appear in the scripts
- Similarly I manually assigned genders to the top 320 or so characters, who collectively account for 86% of the show's dialogue
- I did not dedupe any locations
- Simpsons World is not available in all countries, so the code might not run depending on where you're located