• Stars
    star
    120
  • Rank 295,983 (Top 6 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created almost 11 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data Server for Topic Models

Data Server for Topic Models

Termite is a visual analysis tool for exploring the output of statistical topic models.

This repository contains:

  • a web server based on the web2py framework
  • helper scripts to download various datasets
  • helper scripts to download and setup various topic modeling tools
  • helper scripts to build topic models
  • helper scripts to import topic model outputs into the server

The web server includes various interactive visualizations:

  • term-topic matrix
  • group-in-a-box visualization
  • scatter plot

This software is distributed under the BSD-3 license.

Contributors and Credits

The Termite Data Server is developed and maintained by Jason Chuang with contributions from:

  • Ashley Jin on the initial implementation of the Termite Data Server, the term-topic matrix visualization, and various data processing scripts
  • Alison Smith on the group-in-a-box visualization
  • Michael Freeman on the scatter plot visualization
  • Peter Enns on the web server upload functionality
  • Leo Claudino on data processing for interactive topic models
  • Yuening Hu on data processing for interactive topic models
  • Molly Roberts on data processing for structural topic models

Termite requires on the use of the following software. We thank their respective authors for developing and distributing these tools.

Launch this data server

Currently, this data server can import topic models from:

We are in the process of adding support for:

The data server can be deployed on various platforms supported by web2py. However, the copy included in the repository is customized for Apple's OSX.

Preparations

At the time of writing, the following three tools need to be installed when this repository is first cloned. Execute the following commands at the root of the repository.

bin/setup_corenlp.sh
bin/setup_mallet.sh
make -C utils/corenlp

Start the web server

To launch this data server, execute the following command. A dialogue box will appear. Click on "start server" to proceed.

./start_server.sh

Build a topic model

Several demos are included in this repository.

Executing the following command will download the 20newsgroups dataset (18828 documents), build an LDA topic model with 20 latent topics using MALLET, and launch the web server.

./demo.py 20newsgroups

Executing the following command will download the InfoVis dataset (449 documents with metadata), build an LDA topic model with 20 latent topics using MALLET, and launch the web server.

./demo.py infovis

To build an example topic model on the InfoVis dataset using Gensim:

./demo.py infovis gensim

More generally, to build a topic model on dataset using tool:

./demo.py [dataset] [tool]

To see more demo options:

./demo.py --help

The resulting topic model(s) will be available at:

http://127.0.0.1:8075/

Active Research Project

This is an active research project. While we would like to support as many users as possible, we are constrained by available resources. Below are the system requirements, known issues as well as the API format, for developing additional visualizations and incorporating additional models to the data server.

System requirements

  • Python 2.7 for web2py, server scripts, and other helper scripts
  • Java for MALLET
  • [Optional] NumPy 1.3, SciPy 0.7 for Gensim
  • [Optional] R for Structural Topic Models

Known issues

The web server is based on the web2py framework. While web2py is designed to work on Windows, Mac, and most Unix platforms, we have only tested the system on OSX. The framework will not work under Cygwin on Windows.

API format

A primary goal of developing this data server is to provide a common API (application programming interface), so that multiple topic model visualizations can interact with any number of topic modeling software, and with other visualizations.

All API calls to this web server are in following format.

http:// [server] / [dataset] / [model] / [attribute]

The string [server] is the base portion of the URL, such as http://localhost:8080 when running a local machine. As multiple projects can be hosted on the same server, [dataset] is a string [A-Za-z0-9_]+ that uniquely identifies a project. A web-based visualization can access the content of a topic model by specifying the remaining URL [model]/[attribute], such as lda/TermTopicMatrix and treetm/TermTopicConstraints to retrieve the term-topic matrix and send user-defined constraints to the server, respectively.

License

Copyright (c) 2013, Leland Stanford Junior University Copyright (c) 2014, University of Washington All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

More Repositories

1

visualization-curriculum

A data visualization curriculum of interactive notebooks.
Jupyter Notebook
1,275
star
2

arquero

Query processing and transformation of array-backed data tables.
JavaScript
1,251
star
3

mosaic

An extensible framework for linking databases and interactive views.
JavaScript
688
star
4

draco

Visualization Constraints and Weight Learning
TypeScript
222
star
5

d3-tutorials

D3 Tutorials for CSE512 Data Visualization Course at University of Washington
HTML
170
star
6

imMens

Real-Time Visual Querying of Big Data
HTML
168
star
7

living-papers

Authoring tools for scholarly communication. Create interactive web pages or formal research papers from markdown source.
TeX
129
star
8

errudite

An Interactive Tool for Scalable and Reproducible Error Analysis.
Python
104
star
9

gemini

A grammar and recommender system for animated transitions in Vega/Vega-Lite
JavaScript
103
star
10

vsup

Code for generating Value-Suppressing Uncertainty Palettes for use in D3 charts.
JavaScript
77
star
11

latent-space-cartography

Visual analysis of vector space embeddings
HTML
74
star
12

setcola

High-Level Constraints for Graph Layout
JavaScript
72
star
13

boba

Specifying and executing multiverse analysis
Python
62
star
14

termite-visualizations

[development moved to termite-data-server]
Python
61
star
15

rev

REV: Reverse-Engineering Visualizations
Python
60
star
16

graphscape

A directed graph model of the visualization design space, using Vega-Lite.
JavaScript
58
star
17

fast-kde

Fast, approximate Gaussian kernel density estimation.
JavaScript
56
star
18

bayesian-surprise

Bayesian Weighting for De-Biasing Thematic Maps
TeX
54
star
19

gestrec

A JavaScript implementation of the Protractor gesture recognizer.
JavaScript
36
star
20

perceptual-kernels

Data & source code for the perceptual kernels study
HTML
33
star
21

ellipsis

Visualization Storytelling Components
JavaScript
31
star
22

visual-embedding

Data & source code for the visual embedding model
MATLAB
31
star
23

boba-visualizer

A visual analysis tool for exploring multiverse outcomes
JavaScript
31
star
24

color-naming-in-different-languages

JavaScript
27
star
25

papers-vsup

Visualize uncertainty
TeX
27
star
26

arquero-sql

Database backend support for Arquero
JavaScript
24
star
27

arquero-worker

Worker thread support for Arquero.
JavaScript
22
star
28

living-papers-template

A Living Papers article starter template.
22
star
29

mosaic-framework-example

Using Mosaic and DuckDB within Observable Framework
TypeScript
22
star
30

dziban

Context-Aware, Recommender-Powered Visualization Authoring
Jupyter Notebook
21
star
31

draco-vis

Draco on the web
TypeScript
18
star
32

flechette

Fast, lightweight access to Apache Arrow data.
JavaScript
18
star
33

diagnostics

Topic Model Diagnostics
JavaScript
14
star
34

vegaserver

A simple node server that renders vega specs to SVG or PNG.
JavaScript
13
star
35

visual-encoding-effectiveness-data

Supplement material for "Assessing Effects of Task and Data Distribution on the Effectiveness of Visual Encodings".
JavaScript
13
star
36

divi

Automatically interact with SVG charts.
JavaScript
10
star
37

quantitative-color-data

Data for quantitative colormap study
R
10
star
38

citation-query

Retrieve paper citatation data from doi.org and Semantic Scholar.
JavaScript
10
star
39

arquero-arrow

Arrow serialization support for Arquero.
JavaScript
9
star
40

verp

The VERP Explorer
JavaScript
8
star
41

termite-stm

[development moved to termite-data-server]
Python
8
star
42

code-augmentation

Code augmentation editor
JavaScript
7
star
43

aggregate-animation-data

Supplement material for "Designing Animated Transitions to Convey Aggregate Operations"
JavaScript
7
star
44

vega-dataflow

Reactive dataflow processing.
JavaScript
7
star
45

trend-bias

Experiments on trend-fitting
TeX
6
star
46

termite-treetm

[development moved to termite-data-server]
Python
6
star
47

flights-arrow

Flight Dataset as Apache Arrow in Different Sizes
6
star
48

living-papers-paper

The UIST'23 Living Papers research paper and supplemental material.
JavaScript
5
star
49

fast-kde-benchmarks

Research archive of methods and benchmarks for fast, approximate Gaussian kernel density estimation.
JavaScript
5
star
50

gemini-supplemental-material

Supplemental material for "Gemini: A Grammar and Recommender System for Animated Transitions in Statistical Graphics"
HTML
5
star
51

uwdata.github.io

UW Interactive Data Lab web page
Svelte
5
star
52

palette-analyzer

Analyzes the local and global distances in [RGB, LAB, UCS, Color Names] model, given a palette.
HTML
5
star
53

draco-learn

Learning Weights for Draco
Python
4
star
54

draco-editor

The Draco Online Editor
CSS
4
star
55

datalib

We've moved! Please see https://github.com/vega/datalib
3
star
56

file-cache

File-based cache for JSON-serializable data.
JavaScript
3
star
57

istc-explorer

JavaScript
2
star
58

draco-analysis

Notebooks for Draco
Jupyter Notebook
2
star
59

draco-tools

Tools for Draco
JavaScript
2
star
60

living-papers-examples

Example Living Papers Articles
JavaScript
2
star
61

draco-tuner

An interactive application to modify Draco's knowledge base
TypeScript
1
star