• Stars
    star
    239
  • Rank 168,763 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    Other
  • Created about 3 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A workshop on data visualization in Python with notebooks and exercises for following along. Slides contain all solutions.

Beyond the Basics: Data Visualization in Python

Nbviewer Env Build Workflow Status GitHub repo size View slides in browser

The human brain excels at finding patterns in visual representations, which is why data visualizations are essential to any analysis. Done right, they bridge the gap between those analyzing the data and those consuming the analysis. However, learning to create impactful, aesthetically-pleasing visualizations can often be challenging. This session will equip you with the skills to make customized visualizations for your data using Python.

While there are many plotting libraries to choose from, the prolific Matplotlib library is always a great place to start. Since various Python data science libraries utilize Matplotlib under the hood, familiarity with Matplotlib itself gives you the flexibility to fine tune the resulting visualizations (e.g., add annotations, animate, etc.). This session will also introduce interactive visualizations using HoloViz, which provides a higher-level plotting API capable of using Matplotlib and Bokeh (a Python library for generating interactive, JavaScript-powered visualizations) under the hood.

Workshop Outline

This is a workshop on data visualization in Python first delivered at ODSC West 2021 and subsequently at ODSC East 2022, PyCon Italia 2022, ODSC Europe 2022, EuroPython 2022, ODSC West 2022, the Toronto Machine Learning Summit (TMLS) 2022, PyCon US 2023, ODSC East 2023, and PyCon Italia 2023. It's divided into the following sections:

Section 1: Getting Started With Matplotlib

We will begin by familiarizing ourselves with Matplotlib. Moving beyond the default options, we will explore how to customize various aspects of our visualizations. By the end of this section, you will be able to generate plots using the Matplotlib API directly, as well as customize the plots that libraries like pandas and Seaborn create for you.

Section 2: Moving Beyond Static Visualizations

Static visualizations are limited in how much information they can show. To move beyond these limitations, we can create animated and/or interactive visualizations. Animations make it possible for our visualizations to tell a story through movement of the plot components (e.g., bars, points, lines). Interactivity makes it possible to explore the data visually by hiding and displaying information based on user interest. In this section, we will focus on creating animated visualizations using Matplotlib before moving on to create interactive visualizations in the next section.

Section 3: Building Interactive Visualizations for Data Exploration

When exploring our data, interactive visualizations can provide the most value. Without having to create multiple iterations of the same plot, we can use mouse actions (e.g., click, hover, zoom, etc.) to explore different aspects and subsets of the data. In this section, we will learn how to use a few of the libraries in the HoloViz ecosystem to create interactive visualizations for exploring our data utilizing the Bokeh backend.


Prerequisites

You should have basic knowledge of Python and be comfortable working in Jupyter Notebooks. Check out this notebook for a crash course in Python or work through the official Python tutorial for a more formal introduction. The environment we will use for this workshop comes with JupyterLab, which is pretty intuitive, but be sure to familiarize yourself using notebooks in JupyterLab and additional functionality in JupyterLab. In addition, a basic understanding of pandas will be beneficial, but is not required; reviewing the first section of my pandas workshop will be sufficient.


Setup Instructions

You can work through the notebooks locally or in your browser. Pick the installation option that makes sense for you.

Local Installation

Warning: It is highly recommended that you use your personal laptop for the installation.

  1. Install Anaconda/Miniconda or Mambaforge, if not already installed.

  2. Fork this repository:

    location of fork button in GitHub

  3. Navigate to your fork, and click the Code button:

    location of code button in GitHub

  4. Clone your forked repository using the desired method from the Local tab:

    local cloning options
  5. Create and activate a conda virtual environment (on Windows, these commands should be run in Anaconda Prompt):

    $ cd python-data-viz-workshop
    ~/python-data-viz-workshop$ conda env create --file environment.yml
    ~/python-data-viz-workshop$ conda activate data_viz_workshop
    (data_viz_workshop) ~/python-data-viz-workshop$

    Note: If you installed Mambaforge or have already installed mamba in your base environment, you can change conda env create to mamba env create.

  6. Launch JupyterLab:

    (data_viz_workshop) ~/python-data-viz-workshop$ jupyter lab
  7. Navigate to the 0-check_your_env.ipynb notebook in the notebooks/ folder:

    open 0-check_your_env.ipynb

  8. Run the notebook to confirm everything is set up properly:

    check env

Cloud Options

GitHub Codespaces

Open in GitHub Codespaces

The GitHub Codespaces setup provides a pre-configured machine with Jupyter Notebooks running in Visual Studio Code in your browser. You will need a GitHub account and available quota (all users get more than enough free monthly quota to be able to run this workshop). Note that this will take a while to build. It's recommended that you click the badge above to build the codespace in advance of the workshop and then stop the codespace until the workshop, at which point you can simply resume and pick up where you left off.

Note that if you want to save your changes, you will need to fork the repository before creating the codespace. You will then be able to commit your changes directly from the codespace. Be sure to create your codespace in advance of the session and resume when we start.

  1. Fork this repository:

    location of fork button in GitHub

  2. Navigate to your fork, and click the Code button:

    location of code button in GitHub

  3. Launch the codespace from your fork by clicking on the + or Create codespace on main button in the Codespaces tab:

    location of create codespace button
  4. Stop the codespace until the session starts (click the name to resume).

    ways to modify an existing codespace

Binder

Binder

Depending on server availability, you can use this Binder environment, which does not require the creation of a GitHub account. There is no guarantee that you will be able to access this during the workshop.


About the Author

Stefanie Molin (@stefmolin) is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of Hands-On Data Analysis with Pandas, which is currently in its second edition and has been translated into Korean. She holds a bachelor’s of science degree in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science, as well as a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.

Related Content

All examples herein were developed exclusively for this workshop. Hands-On Data Analysis with Pandas contains additional examples and exercises, as does this blog post and this workshop on pandas.

More Repositories

1

Hands-On-Data-Analysis-with-Pandas-2nd-edition

Materials for following along with Hands-On Data Analysis with Pandas – Second Edition
Jupyter Notebook
570
star
2

Hands-On-Data-Analysis-with-Pandas

Materials for following along with Hands-On Data Analysis with Pandas.
Jupyter Notebook
410
star
3

pandas-workshop

An introductory workshop on pandas with notebooks and exercises for following along. Slides contain all solutions.
Jupyter Notebook
340
star
4

stock-analysis

Simple to use interfaces for basic technical analysis of stocks.
Python
325
star
5

data-morph

Morph an input dataset of 2D points into select shapes, while preserving the summary statistics to a given number of decimal points through simulated annealing. It is intended to be used as a teaching tool to illustrate the importance of data visualization.
Python
55
star
6

login-attempt-simulator

Simulation of regular login activity on a site and random activity from a hacker using a brute-force password guessing attack.
Python
14
star
7

ml-utils

Machine learning utility functions and classes.
Python
12
star
8

exif-stripper

Pre-commit hook to ensure image EXIF data is removed.
Python
8
star
9

pre-commit-workshop

"(Pre-)Commit to Better Code" workshop
6
star
10

airline-market-share-analysis

Blog post for OpenDataScience.com showing how to create a pivot table and stacked bar visualization of airline market share.
Jupyter Notebook
6
star
11

SCOPE-Anomaly-Detection-Case-Study

Case study on rules-based and machine learning models for anomaly detection used in SCOPE alerts.
Jupyter Notebook
5
star
12

Metis

Metis Web App built using Flask to collect opinions from users on KPI evolutions to use in Machine Learning Anomaly Detection methods.
Python
3
star
13

Custom-Colormaps

Utility functions for working with colors in Python.
Python
2
star
14

binder-environments

Central location for binder environments, especially those used with my book, Hands-On Data Analysis with Pandas.
2
star
15

R-training-program

This is the custom R training program I developed to teach fellow analysts using company data. All sensitive data has been removed.
R
1
star
16

Bulls-and-Cows

Bulls and Cows command line game for guessing a number.
Java
1
star
17

DidYouFeelIt

DidYouFeelIt? App from the Udacity Android Basics: Networking course (prerequisite for Grow with Google Android Developer Scholarship phase 1)
Java
1
star
18

stefmolin

Smarty
1
star
19

data-morph-talk

Slides for my talk "Data Morph: A Cautionary Tale of Summary Statistics"
HTML
1
star
20

Yeshiva-DAV5400

Template repository for Yeshiva's DAV5400 with configuration for GitHub codespaces (VS Code and JupyterLab).
1
star
21

stefmolin.github.io

My personal website (stefaniemolin.com).
TypeScript
1
star
22

SCOPE-anomaly-detection-emails

SCOPE Anomaly Detection System -- Sends daily emails with curated alerts to people who can investigate further.
HTML
1
star
23

CourtCounter

CourtCounter app from the Udacity Android Basics: User Input course (prerequisite for Grow with Google Android Developer Scholarship phase 1)
Java
1
star