Tweitgeist v1.2.0
Tweitgeist analyses the Twitter Spitzer hose and compute in realtime the top trending hashtags using RedStorm/Storm. What makes this interesting other than being a cool Storm example, is the fact that this architecture will work at full Twitter Firehose scale without much modifications.
- See the slideshare presentation about Twitter Big Data and Tweitgeist.
- See the live demo on http://tweitgeist.colinsurprenant.com/
There are three components:
- The Twitter Spitzer stream reader which pushes messages in a Redis queue
- The Redstorm analyser which read the Twitter stream queue, computes the trending hashtags and output the top N list every 5 seconds in a Redis queue
- The viewer UI for the visualization
Dependencies
This has been tested on OSX 10.6+, Linux 11.10 & 12.04 using JRuby 1.6.x for the RedStorm topology and Ruby 1.9.x for the Twitter Spitzer hose reader.
Installation
- Redis is required
- RVM is highly recommended as you will need to work with both Ruby/JRuby and different gemsets.
Redstorm backend
-
requires JRuby 1.6.x
-
set JRuby in 1.9 mode by default
export JRUBY_OPTS=--1.9
-
install the RedStorm gem using bundler with the supplied Gemfile
$ bundle install
-
run RedStorm installation
$ bundle exec redstorm install
-
package the topology required gems
$ bundle exec redstorm bundle topology
-
if you plan on running the topology on a cluster, package the topology jar
bundle exec redstorm jar lib/tweitgeist/
Twitter Spitzer stream reader
-
requires Ruby 1.9.x
-
install required gems using bundler with the supplied Gemfile
$ bundle install
Viewer
-
requires Node.js
$ sudo apt-get install nodejs
-
requires npm
$ sudo apt-get install npm
-
install CoffeeScript if you want to modify the Node.js server
$ npm install -g coffee-script
-
install other dependencies
$ cd lib/viewer $ npm install .
Usage overview
Redstorm backend
-
requires JRuby 1.6.x
-
set JRuby in 1.9 mode by default
export JRUBY_OPTS=--1.9
RedStorm backend in local mode.
$ bundle exec redstorm local lib/tweitgeist/storm/tweitgeist_topology.rb
RedStorm backend in remote cluster mode.
-
add your cluster info to
~/.storm/storm.yaml
see setting up a Storm development environment -
make sure your locally installed storm distribution
bin/
directory is in your $PATH
$ bundle exec redstorm cluster lib/tweitgeist/storm/tweitgeist_topology.rb
Twitter Spitzer stream reader
-
requires Ruby 1.9.x
-
edit
config/twitter_reader.rb
to add your credentials
$ ruby lib/tweitgeist/twitter/twitter_reader.rb
Viewer
$ coffee server.coffee --port 8080 --host 127.0.0.1 --redis-port 6379 --redis-host 127.0.0.1
or (with simulated data in case of no redis)
$ coffee server.coffee --port 8080 --host 127.0.0.1 --mock
Author
Colin Surprenant, @colinsurprenant, https://github.com/colinsurprenant, [email protected]
Contributors
Francois Lafortune, @quickredfox, https://github.com/quickredfox, [email protected]
Nicholas Brochu, @nbrochu, https://github.com/nbrochu, [email protected]
License
Tweitgeist is distributed under the Apache License, Version 2.0.