• Stars
    star
    90
  • Rank 369,088 (Top 8 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 6 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Cell-by-cell testing for production Jupyter notebooks in JupyterLab

Cell-by-cell testing for production Jupyter notebooks in JupyterLab

Build Status codecov PyPI PyPI npm

Overview

nbcelltests is designed for writing tests for linearly executed notebooks. Its primary use is for unit testing reports.

Installation

Python package installation: pip install nbcelltests

To use in JupyterLab, you will also need the lab and server extensions. Typically, these are automatically installed alongside nbcelltests, so you should not need to do anything special to use them. The lab extension will require a rebuild of JupyterLab, which you'll be prompted to do on starting JupyterLab the first time after installing celltests (or you can do manually with jupyter lab build). Note that you must have node.js installed (as for any lab extension).

To see what extensions you have, check the output of jupyter labextension list (look for jupyterlab_celltests), and jupyter serverextension list (look for nbcelltests). If for some reason you need to manually install the extensions, you can do so as follows:

jupyter labextension install jupyterlab_celltests
jupyter serverextension enable --py nbcelltests

(Note: if using in an environment, you might wish to add --sys-prefix to the serverextension command.)

"Linearly executed notebooks?"

When converting notebooks into html/pdf/email reports, they are executed top-to-bottom one time, and are expected to contain as little code as reasonably possible, focusing primarily on the plotting and markdown bits. Libraries for this type of thing include Papermill, JupyterLab Emails, etc.

Doesn't this already exist?

Nbval is a great product (we leverage it in this project) and I recommend using it for notebook regression tests. But it only allows for testing for unexpected failures or simple output equality tests.

So why do I want this again?

This doesn't necessarily help you if your data sources go down, but its likely you'll notice this anyway. Where this comes in handy is:

  • when the environment (e.g. package versions) are changing in your system
  • when you play around in the notebook (e.g. nonlinear execution) but aren't sure if your reports will still generate
  • when your software lifecycle systems have a hard time dealing with notebooks (can't lint/audit them as code unless integrated nbdime/nbconvert to script, tough to test, tough to ensure what works today works tomorrow)

So what does this do?

Given a notebook, you can write mocks and assertions for individual cells. You can then generate a testing script for this notebook, allowing you to hook it into your testing system and thereby provide unittests of your report.

Writing tests

When you write tests for a cell, we create a new method on a unittest class corresponding to the index of your cell, and including the cumulative tests for all previous cells (to mimic what has happened so far in the notebook's linear execution). You can write whatever mocking and asserts you like, and can call %cell to inject the contents of the cell into your test. The tests themselves are stored in the cell metadata, similar to celltags, slide information, etc.

Running tests

You can run the tests offline from an .ipynb file, or you can execute them from the browser and view the results of pytest-html's html plugin.

Extra Tests

  • Max number of lines per cell
  • Max number of cells per notebook
  • Max number of function definitions per notebook
  • Max number of class definitions per notebook
  • Percentage of cells tested

Example

In the committed examples/Example.ipynb notebook, but modified so that cell 0 has its import statement copied 10 times (to trigger test and lint failures):

Tests

The following output is generated by running nbcelltests test examples/Example.ipynb

examples/_Example_test.py::TestNotebook::test_cell_coverage PASSED                                                                               [ 20%]
examples/_Example_test.py::TestNotebook::test_code_cell_1 PASSED                                                                                 [ 40%]
examples/_Example_test.py::TestNotebook::test_code_cell_2 PASSED                                                                                 [ 60%]
examples/_Example_test.py::TestNotebook::test_code_cell_3 PASSED                                                                                 [ 80%]
examples/_Example_test.py::TestNotebook::test_code_cell_4 PASSED                                                                                 [100%]

Lint

The following output is generated by running nbcelltests lint examples/Example.ipynb

PASSED: Checking lines in cell (max=10; actual=2) (Cell 1)
PASSED: Checking lines in cell (max=10; actual=1) (Cell 2)
PASSED: Checking lines in cell (max=10; actual=1) (Cell 3)
PASSED: Checking lines in cell (max=10; actual=1) (Cell 4)
PASSED: Checking cells per notebook (max=10; actual=4)
PASSED: Checking functions per notebook (max=10; actual=0)
PASSED: Checking classes per notebook (max=10; actual=0)
FAILED: Checking lint:
	examples/Example.ipynb (in /var/folders/s3/1mjw0y192zg3450tkkn1yfnm0000gn/T/tmpp91li59p.py):32:1: F821 undefined name 'test3'
	examples/Example.ipynb (in /var/folders/s3/1mjw0y192zg3450tkkn1yfnm0000gn/T/tmpp91li59p.py):32:6: W291 trailing whitespace

NB: In jupyterlab, notebooks will be lint checked in-process using the version of python that is running jupyter lab itself. A notebook intended to be run with a Python 2 kernel could therefore generate syntax errors during lint checking.

Development

See CONTRIBUTING.md for guidelines.

License

This software is licensed under the Apache 2.0 license. See the LICENSE and AUTHORS files for details.

More Repositories

1

python-training

Python training for business analysts and traders
Jupyter Notebook
4,880
star
2

modular

A modular front end development framework
TypeScript
602
star
3

jupyter-fs

A filesystem-like contents manager for multiple backends in Jupyter
TypeScript
201
star
4

salt-ds

React UI components built with a focus on accessibility, customisation and ease-of-use
TypeScript
77
star
5

jif-dashboard

A dashboard framework to quickly build widget-based dashboards
JavaScript
71
star
6

abides-jpmc-public

Jupyter Notebook
62
star
7

jpmorganchase.github.io

JPMC IO Site
HTML
31
star
8

swblocks-baselib

A modern C++11 library that provides a number unique capabilities, idiomatic blocks and wrappers which are generic, flexible, compose-able and can be used in many generic contexts for development of both applications and system level components
C++
30
star
9

sandboni-core

Sandboni - Java test optimization library which reduces test execution time without compromising quality
Java
28
star
10

swblocks-decisiontree

swblocks-decisiontree library is a high performance, highly flexible service which evaluates inputs to a set of rules to identify one and only one output rule which in term results in a set of outputs
Java
25
star
11

Phantom

A Multi-agent reinforcement-learning simulator framework.
Python
24
star
12

payments

JPMC Payments open source projects
Shell
21
star
13

iff

Feature Flags: The Next Generation
TypeScript
20
star
14

opencell

Spreadsheets 3.0
Python
18
star
15

topical

Jupyter Notebook
18
star
16

py-avro-schema

Generate Apache Avro schemas for Python types including standard library data-classes and Pydantic data models.
Python
15
star
17

mosaic

https://mosaic-mosaic-dev-team.vercel.app
TypeScript
13
star
18

depcom

A blazing fast go / npm package that extracts imported dependencies from Javascript / Typescript / CSS source files.
Go
13
star
19

java-lint-assert

Java Lint Library
Java
12
star
20

swblocks-jbl

swblocks-jbl library is a set of core Java utilities based on Java 8 which provides as set of core error handling tools and additional utilites used across the swblocks projects. It has been written to avoid the problem of including a number of large external dependencies which are only required for one or two classes.
Java
12
star
21

fusion-java-sdk

A Java SDK for the Fusion platform API
Java
11
star
22

kallisti

Chaos Engineering Framework across Private / Public / Hybrid Cloud Environments
Python
11
star
23

SFinX

Standardized FINancial eXtractions
Python
9
star
24

fusion

PyFusion is the Python SDK for the Fusion platform API.
Python
9
star
25

fusion-notebooks

Code and Jupyter notebooks that provide working examples of how to use the PyFusion SDK
Jupyter Notebook
8
star
26

kallisti-core

Core functionality of Kallisti Chaos Engineering Framework
Python
7
star
27

cf-shap

Counterfactual SHAP: a framework for counterfactual feature importance
HTML
7
star
28

inference-server

Deploy your AI/ML model to Amazon SageMaker for real-time inference using your own Docker container image.
Python
7
star
29

llm-email-spam-detection

LLM for Email Spam Detection
Python
6
star
30

react-component-usage

A tool for UI or Design System libraries to understand usage patterns of React components within it's organisation
JavaScript
6
star
31

unicorn-finance

Payments - we have created Unicorn Finance as a sample application showcasing the capabilities of our JP Morgan core external APIs.
TypeScript
4
star
32

cf-shap-facct22

Counterfactual Shapley Additive Explanation: Experiments
Jupyter Notebook
4
star
33

.github

Community Content for JPMC Repositories
3
star
34

Figma-Plugins-and-Widgets

Figma plugins and widgets to enhance design workflows
TypeScript
3
star
35

salesforce-b2c

host adapters for salesforce
JavaScript
2
star
36

cv4code

CV4Code - Sourcecode Understanding via Visual Code Representations
Python
2
star
37

MaSS

Python
2
star
38

sap

host adapters for sap
1
star
39

pandemic-ui-chase

JPMC Institute Project Pandemic UI Chase
R
1
star
40

dcmppln

Python
1
star