Overview
Inviso is a lightweight tool that provides the ability to search for Hadoop jobs, visualize the performance, and view cluster utilization.
Design and Components
REST API for Job History: REST endpoint to load an entire job history file as a json object.
ElasticSearch: Search over jobs and correlate Hadoop jobs for Pig and Hive scripts.
Python Scripts: Scripts to index job configurations into ElasticSearch for querying. These scripts can accommodate a pub/sub model for use with SQS or some other queuing service to better distribute the load or allow other systems to know about job events.
Web UI: Provides an interface to serach and visualize jobs and cluster data.
Requirements
- JDK 1.7+
- Apache Tomcat (7+)
- ElasticSearch (1.0+)
- Hadoop 2 Cluster
- Log aggregation must be enabled for task log linking to work
- Specific version of Hadoop may need to set in the gradle build file
- Some functionality is available for Hadoop 1, but requires more configuration
QuickStart
Inviso is easy to setup given a Hadoop cluster. To get a quick preview, it is easiest to configure Inviso on the NameNode/ResourceManager host.
- Pull down required resources and stage them
> wget http://<mirror>/.../apache-tomcat-7.0.55.tar.gz
> tar -xzf apache-tomcat-7.0.55.tar.gz
> rm -r apache-tomcat-7.0.55/webapps/*
> wget http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.2.tar.gz
> tar -xzf elasticsearch-1.3.2.tar.gz
- Clone the Inviso repository and build the java project
> git clone https://github.com/Netflix/inviso.git
> cd inviso
> ./gradlew assemble
> cd ..
- Copy WAR files and link Static Web Pages
> cp inviso/trace-mr2/build/libs/inviso#mr2#v0.war apache-tomcat-7.0.55/webapps/
> ln -s `pwd`/inviso/web-ui/public apache-tomcat-7.0.55/webapps/ROOT
- Start ElasticSearch and create Indexes
> ./elasticsearch-1.3.2/bin/elasticsearch -d
> curl -XPUT http://localhost:9200/inviso -d @inviso/elasticsearch/mappings/config-settings.json
{"acknowledged":true}
> curl -XPUT http://localhost:9200/inviso-cluster -d @inviso/elasticsearch/mappings/cluster-settings.json
{"acknowledged":true}
- Start Tomcat
> ./apache-tomcat-7.0.55/bin/startup.sh
- Build virtual environment and index some jobs
> virtualenv venv
> source venv/bin/activate
> pip install -r inviso/jes/requirements.txt
> cd inviso/jes/
> cp settings_default.py settings.py
> python jes.py
> python index_cluster_stats.py
#Run in a cron or loop
> while true; do sleep 60s; python jes.py; done&
> while true; do sleep 60s; python index_cluster_stats.py; done&
- Navigate to http://hostname:8080/
QuickStart - Docker Version
An alternate way of starting the inviso project would be via docker. If you already have docker installed, you can run the following command:
docker run -d -p 8080:8080 savaki/inviso
This will launch inviso in your container running on port 8080.
Enjoy!