Datagramas
A Python library for your Jupyter Notebook that helps you to use and scaffold visualizations with d3.js. It works with 3.4 or newer.
NOTE We are currently updating this library to the 1.0.0 version. Please help us test if installation works ok :)
Overview
Datagramas is a visualization development support tool and a visualization library at the same time. Initially I implemented it to help me develop visualizations in the context of my doctoral thesis. I was researching the mixture of algorithms and visualizations, and therefore I was always iterating over algorithm and visualization design. Hence, currently datagramas supports some visualizations that I needed to implement in my doctoral thesis, plus other examples I found to be interesting to explore.
The main objective of datagramas is to provide an environment to bootstrap visualization implementations, and use scaffolding through templates to be able to reuse the visualizations and to explore data in the Jupyter Notebook.
An important aspect of datagramas is that it works with standard scientific Python data-structures: pandas DataFrames and NetworkX graphs. By using datagramas to develop your visualization, you do not need to worry about data structures and formats. For instance, have you ever found an example visualization that seemed to be what you wanted, but the data structure used was arbitrarily chosen by the developer? And that structure was completely different from what you were using/expecting? By using datagramas there are no arbitrary choices - you use DataFrames and specify which columns will be mapped to the visualization, and that's it.
Examples / Documentation
In addition to this readme, the following notebooks serve as examples/documentation:
Initialization / Installation
First, install the python package (please note that the requirements are needed to run datagramas, but not the examples):
$ pip install -r requirements.txt
$ python setup.py install
Then make a symbolic link in your IPython profile to the datagramas libs folder:
$ cd ~/.jupyter/custom
$ ~/.jupyter/custom$ ln -s ~/path_to_datagramas/datagramas/libs/ datagramas
And finally, edit the custom.js
file and add the following lines (if there is no such file, create it):
require.config({
paths: {
"sankey": "/custom/datagramas/d3-sankey/sankey",
"cartogram": "/custom/datagramas/d3-cartogram/cartogram",
"d3": "/custom/datagramas/d3/d3.min",
"leaflet": "/custom/datagramas/leaflet/leaflet",
"topojson": "/custom/datagramas/topojson/topojson.min",
"parsets": "/custom/datagramas/d3-parsets-1.2.4/d3.parsets",
"datagramas": "/custom/datagramas/datagramas",
"force_edge_bundling": "/custom/datagramas/d3-force-bundling/d3.ForceEdgeBundling",
"legend": "/custom/datagramas/d3-legend/d3-legend.min",
"cloud": "/custom/datagramas/d3-cloud/d3.layout.cloud",
"cola": "/custom/datagramas/cola/cola.min",
"d3-geo-projection": "/custom/datagramas/d3-geo-projection/d3.geo.projection.min",
"d3-tip": "/custom/datagramas/d3-tip/index"
},
shim: {
"sankey": {
"exports": "d3.sankey",
"deps": ["d3"]
},
"cartogram": {
"exports": "d3.cartogram",
"deps": ["d3"]
},
"cola": {
"exports": "cola",
"deps": ["d3"]
},
"parsets": {
"exports": "d3.parsets",
"deps": ["d3"]
},
"legend": {
"exports": "d3.legend",
"deps": ["d3"]
},
"d3-geo-projection": {
"exports": "d3.geo.projection",
"deps": ["d3"]
},
"d3-tip": {
"exports": "d3.tip",
"deps": ["d3"]
}
},
});
require(['datagramas'], function(datagramas) {
datagramas.add_css('/custom/datagramas/datagramas.css');
});
This code can be generated by the Python function datagramas.init_javascript_code(path='/custom/datagramas')
.
It will make Jupyter to load the necessary Javascript code for datagramas every time you load a notebook file.
If you use an older version of Jupyter Notebook, note that you will need to include the "/static" prefix to those URLs.
Visualization Modules
All visualizations in datagramas are regular Python modules (see the datagramas/visualizations
folder).
A module is composed of a configuration file (__init__.py
) and several template and style files.
The Let's Make Scaffold a Barchart
example notebook contains a basic visualization that showcases some of these concepts.
Currently, datagramas includes the following visualizations (in alphabetical order):
cartogram
of a TopoJSON topology and a pandas DataFrame.cartography
of a Topo/GeoJSON geometry, pandas DataFrames for marks and area colors, and NetworkX graphs over the map.circlepack
of a NetworkX tree.flow
(Sankey diagram) of a NetworkX graph.force
directed layout of a NetworkX graph.parcoords
- parallel coordinates with a pandas DataFrame.parsets
- parallel sets with a pandas DataFrame.treemap
of a NetworkX tree.wordcloud
of a pandas DataFrame.
The Basic Notebook Examples notebook showcases the usage of most of those visualizations.
The Let's Make a Map Too (and a Cartogram!)
notebook showcases the usage of cartogram
and cartography
.
Template Files
The following are the template files used by datagramas when rendering a visualization:
template.js
: the main template of each visualization module. Think of this file as the body of adraw()
function in a typical visualization module.template.css
(optional)functions.js
(optional)
When datagramas renders your visualization, it embeds those files into a bigger visualization that follows the reusable chart pattern by Mike Bostock.
Module Configuration
A visualization module must contain a dictionary named VISUALIZATION_CONFIG
in its __init__.py
file,
with at least some of the following elements:
- Options: these are values that influence how the visualization is rendered. For instance, the
cartography
visualization has the following options:
'options': {
'leaflet': False,
'background_color': False,
'graph_bundle_links': False
}
When rendering, if you call cartography(geometry=topojson)
, the geometry you specified will be rendered as any other
visualization: just a plain SVG with white background. But if you call cartography(geometry=topojson, leaflet=True)
,
the visualization will be rendered as a slippy map using leaflet.
- Data: this element indicates which data variables will be available to the visualization. For instance, the
cartogram
visualization has the following setup:
'data': {
'geometry': None,
'area_dataframe': None,
}
This means that a cartogram can be called with a TopoJSON
geometry (which you should load from a .js
file)
and a pandas DataFrame. In your visualization code, these variables will be available as _data_geometry
and
_data_area_dataframe
.
- Variables: the elements of this dictionary are directly translated into variables available in the template file.
For instance, in the
barchart
example available above, these are the variables:
'variables': {
'width': 960,
'height': 500,
'padding': {'left': 30, 'top': 20, 'right': 30, 'bottom': 30},
'x': 'x',
'y': 'y',
'y_axis_ticks': 10,
'y_label': None,
'rotate_label': True,
}
All these variables are available in the template file, with an underscore appended (e.g., _width
). Moreover,
you can modify them when rendering by using keyword arguments: barchart(dataframe=df, x='letter', y='frequency')
.
- Auxiliary Variables: these are Javascript variables that are available to the template code, but are not
reachable from Python nor the public JS interface. You can use them to mantain state in the visualization or to
cache results. This is an example from the
cartography
visualization:
'auxiliary': {
# a set to save mark positions. since there are two possible sources of positions, we need to do this.
'mark_positions',
# the list of available features from the geometry source.
'available_feature_ids',
# the list of colors per area
'area_colors'
}
Those variables are available as auxiliary.var_name
(e.g., auxiliary.mark_positions
).
- Read-only Properties: these are JS variables that are available in Javascript through getters. For instance, in the
cartography
visualization you can have a Leaflet instance, among other variables:
'read_only': {
# leaflet
'L',
'map',
# the map projection. this could be used to add other things on top of the visualization.
'projection',
# here we save the geometry specified - it can be either GeoJSON or TopoJSON.
'geometry'
}
If your reusable chart is called chart
, then, from Javascript, you can access those variables (e.g., chart.L()
).
- Mapped Attributes: these are mappings between data attributes (e.g., a column in your dataframe) and visualization
attributes (e.g., the ratio of a circle). For instance, in the
force
visualization these are the mapped attributes:
'attributes': {
'node_ratio': {'min': 8, 'max': 16, 'value': None, 'scale': 'linear'},
'link_opacity': {'min': 0.5, 'max': 1.0, 'value': None, 'scale': 'linear'},
'link_width': {'min': 0.5, 'max': 1.0, 'value': None, 'scale': 'linear'},
}
This means that, in JS, you will have a variable available named _var_name
(e.g., _node_ratio
). This variable
will be a function that, when called with a datum, will return the corresponding value according to the range and
scale (which could be linear
, sqrt
, or a number - used with d3.scale.pow()
) defined in the parameters.
Following the force
example, in Python you can specify a node_ratio
when calling the visualization in three ways
(note that g
is a NetworkX
graph):
datagramas.force(graph=g, node_ratio=15)
: all nodes will have ratio 15.
datagramas.force(graph=g, node_ratio='size')
: node ratio will be proportional to the size
node attribute,
using the default minimum and maximum values, and the default scale.
datagramas.force(graph=g, node_ratio={'value': 'size', 'scale': 'sqrt', 'max': 32})
: node ratio will be proportional to the
size
node attribute, with sqrt
scale, with a maximum value of 32.
- Colorables: these are mappings between data attributes and colors. For instance, the
force
visualization defines the following colorables:
'colorables': {
'node_color': {'value': 'steelblue', 'palette': None, 'scale': None, 'legend': False, 'n_colors': None},
'link_color': {'value': 'grey', 'palette': None, 'scale': None, 'legend': False, 'n_colors': None}
}
In a similar way to mapped attributes, you can specify a color directly, or by overriding the dictionary for each colorable:
datagramas.force(graph=g, link_color='purple)
: all links will be colored purple.
datagramas.force(graph=g, link_color={'value': 'source.bipartite', 'palette': 'Set2', 'scale': 'ordinal'})
: all links
will be colored according to the source.bipartite
attribute of each link (this translates to the bipartite
attribute of the source node of each link - yes, you can use dot notation).
Note that, given that we cannot discriminate between a color string and a column/attribute name, we need to specify the arguments dictionary.
If the palette is a string, it must be recognized by the function seaborn.color_palette
.
- Objects: these are d3js objects wrapped in a Python class.
Extra Functions
Your __init__.py
file can define auxiliary functions and attributes.
Datagramas particularly supports the following one:
PROCESS_CONFIG(config)
: whereconfig
is the current instance of theVISUALIZATION_CONFIG
dictionary.
Among other uses, this function could be used to handle dependencies. For instance, if you specify leaflet=True
in
cartography
, leaflet is added as a dependency. Or, if you specify a projection name (through the projection_name
variable), a d3js object is added to the current visualization objects.
Scaffolding
Until now, we have explained how datagramas allows you to code and render visualizations. They are already usable on the Jupyter
Notebook, but you want to export the visualization into a reusable chart that you can use in your projects. If that is the
case, Datagramas includes that functionality through a method called scaffold
.
For example, if you look at the barchart
example you will find this notebook cell:
barchart(x='letter', y='frequency').scaffold(filename='./scaffolded_barchart.js')
What this line does is to create a file named scaffolded_barchart.js
which you can import into your projects. This
chart uses the reusable pattern mentioned in the introduction of this file. In the "In the wild" section at the end
you can find a couple of links with scaffolded visualizations.
Credits
Datagramas bundles the following Javascript libraries (see the datagramas/libs
subfolder):
- d3.js
- d3.sankey
- d3.layout.cloud
- d3.ForceEdgeBundling
- d3.parsets
- topojson
- leaflet
- cartogram.js
- WebCola
- d3-legend
- d3-geo-projection
- d3-tip
The file datagramas/libraries.py
specifies library versions and other meta-data.
Datagramas also contains snippets of code from:
- D3 Plus: we use the color text function.
- Utilitary Javascript functions from Stack Overflow users, acknowledged on
datagramas/libs/datagramas.js
.
Next Steps?
In no particular order:
- Add events to all included visualizations (currently a few of them supportn events).
- Facet data with small-multiples or visualization widgets (in a similar way to seaborn's FacetGrid).
- Improve the legend support. Currently legend positioning is not smart, and legend activation is not automatic for charts.
- Support other bundled layouts/plugins with d3.js.
- Support layers in the cartography module.
About the (old) name
The first official version of Datagramas is titled "Matta" in honor of Roberto Matta. Curiously, he has a painting named "ojo con los desarrolladores" (desarrolladores is spanish for developers).
In the Wild
- 2|S: Los Dos Santiagos: this is a project where we scaffolded many visualizations (Sankey, TopoJSON, Force Edge Bundle) to visualize transport data in Santiago, Chile. All visualizations in the page were scaffolded with datagramas! Note: the site is in spanish.
- Twitter Data Portraits: this visualization was implemented in datagramas for my doctoral thesis. I needed a way to visualize Twitter profiles and the output of a recommender algorithm. Since the data used in the visualization was constantly changing (because algorithms were being developed), I needed a more dynamic way to implement the visualization than always editing JS/HTML files and then reloading everything, including re-execution of algorithms.
Versioning
Datagramas uses semantic versioning. We (will) start with 1.0.0.
Testing
There is no automated testing. However, the example notebooks pretty much cover everything. Feel free to contribute in this aspect!