• Stars
    star
    420
  • Rank 103,194 (Top 3 %)
  • Language
  • License
    MIT License
  • Created over 6 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Jupyter Tips, Tricks, Best Practices with Sample Code for Productivity Boost

Making the Best of Jupyter

Tips, Tricks, Best Practices with Sample Code for Productivity Boost

Found useful by Nobel Laureates and more:

"..., this looks very helpful"

  • Economics Nobel Laureate 2018, Dr. Paul Romer on Twitter

Contents

Getting Started Right

  • Start your Jupyter server with supervisor or tmux instead of direct ssh or bash. This works out to be more stable Jupyter server which doesn't die unexpectedly. Consider writing the buffer logs to a file rather than stdout
    • This is specially useful when working inside Docker using docker attach where you might not see a lot of logs
  • Consider using a ssh client like MobaXterm Personal Portable Edition with multiple tabbed ssh client options
  • Refer How to Tunnel using SSH (with illustrations) to tunnel to a remote Jupyter notebook

Debugging

  • When you see an error, you can run %debug in a new cell to activate IPython Debugger. Standard keyboard shortcuts such as c for continue, n for next, q for quit apply
  • Use from IPython.core.debugger import set_trace to IPython debugger checkpoints, the same way you would for pdb in PyCharm
from IPython.core.debugger import set_trace

def foobar(n):
    x = 1337
    y = x + n
    set_trace() #this one triggers the debugger
    return y

foobar(3)

Returns:

> <ipython-input-9-04f82805e71f>(7)fobar()
      5     y = x + n
      6     set_trace() #this one triggers the debugger
----> 7     return y
      8 
      9 foobar(3)

ipdb> q
Exiting Debugger.

Preference Note: If I already have an exception, I prefer %debug because I can zero down to the exact line where code breaks compared to set_trace() where I have to traverse line by line

  • When editing imported code, use %load_ext autoreload; %autoreload 2 . The autoreload utility reloads modules automatically before entering the execution of code typed at the IPython prompt.

This makes the following workflow possible:

In [1]: %load_ext autoreload

In [2]: %autoreload 2  # set autoreload flag to 2. Why? This reloads modules every time before executing the typed Python code

In [3]: from foo import some_function

In [4]: some_function()
Out[4]: 42

In [5]: # open foo.py in an editor and change some_function to return 43

In [6]: some_function()
Out[6]: 43
  • When using print(out_var) on a nested list or dictionary, consider doing print(json.dumps(out_var, indent=2)) instead. It will pretty print the output string.

Programming Sugar

  • Executing a shell command from inside your notebook. You can use this to check what files are in available in your working folder!ls *.csv or even !pwd to check your current working directory
    • You can cd {PATH} where PATH is a Python variable, similarly you can do PATH = !pwd to use relative paths instead of absolute
    • Both pwd and !pwd work with mild preference for !pwd to signal other code readers that this is a shell command
    • Shell commands are nice, but we discourage their use - makes it difficult to refactor to script later. For instance cd ../../ in Jupyter could be done using os.setcwd()as well
  • Running jupyter from an environment does NOT mean that the shell environment in ! will have the same environment variables
    • Running !pip install foo (or conda install bar) will use the pip which is in the path for the sh shell which might be different from whatever bash shell environment you use
  • If you want to install a package while inside Jupyter and !pip install foo doesn't seem to do it, try:
import sys
!{sys.executable} -m pip install foo  # sys.executable points to the python that is running in your kernel 

Search Magic

Use the Search Magic file - no need to pip install. Download and use the file.

In [1]: from search_magic import SearchMagic
In [2]: get_ipython().register_magics(SearchMagic)

In [3]: %create_index
In [4]: %search tesseract
Out[4]: Cell Number -> 2
        Notebook -> similarity.ipynb
        Notebook Execution Number -> 2

Jupyter Kungfu

  • If in a cell after writing a function you hit shift + tab, it will display function's docstring in a tooltip, and it has options to expand the tooltip or expand it at the bottom of the screen
  • Use ?func_name() to view function, class docstrings etc. For example:
?str.replace() 

Returns:

Docstring:
S.replace(old, new[, count]) -> str

Return a copy of S with all occurrences of substring
old replaced by new.  If the optional argument count is
given, only the first count occurrences are replaced.
Type:      method_descriptor
  • List all the variables/functions in a module: module_name.*?. For instance: pd.*?. Additionally, this works with prefixes: pd.read_*? and pd.*_csv? will also work
  • Show the docstring+code for a function/class using : pd.read_csv??
  • Press h to view keyboard shortcuts

Sanity Checks

  • If your imports are failing, check your notebook kernel on the right top in gray
  • Consider using conda for instead of pip virtualenv similar because that ensures package versions are consistent. conda is not a Python package manager. Check conda (vs pip): Myths and Misconceptions from the creator of Pandas
  • The cell type can be changed to markdown and plain text too
    • Some people convert code cells to markdown if you don't want to execute them but don't want to comment either
  • Consider downloading a notebook as a Python file and then push to Github for code review or use nbdime

nbdime

Selective Diff/Merge Tool for jupyter notebooks

Install it first:

pip install -e git+https://github.com/jupyter/nbdime#egg=nbdime

It should automatically configure it for jupyter notebook. If something doesn’t work, see installation.

Then put the following into ~/.jupyter/nbdime_config.json:

{

  "Extension": {
    "source": true,
    "details": false,
    "outputs": false,
    "metadata": false
  },

  "NbDiff": {
    "source": true,
    "details": false,
    "outputs": false,
    "metadata": false
  },

  "NbDiffDriver": {
    "source": true,
    "details": false,
    "outputs": false,
    "metadata": false
  },

  "NbMergeDriver": {
    "source": true,
    "details": false,
    "outputs": false,
    "metadata": false
  },

  "dummy": {}
}

Change outputs value to true if you care to see outputs diffs too.

Markdown Printing

Including markdown in your code’s output is very useful. Use this to highlight parameters, performance notes and so on. This enables colors, Bold, etc.

from IPython.display import Markdown, display
def printmd(string, color=None):
    colorstr = "<span style='color:{}'>{}</span>".format(color, string)
    display(Markdown(colorstr))

printmd("**bold and blue**", color="blue")

Find currently running cell

Add this snippet to the start of your notebook. Press Alt+I to find the cell being executed right now. This does not work if you have enabled vim bindings:

%%javascript
// Go to Running cell shortcut
Jupyter.keyboard_manager.command_shortcuts.add_shortcut('Alt-I', {
    help : 'Go to Running cell',
    help_index : 'zz',
    handler : function (event) {
        setTimeout(function() {
            // Find running cell and click the first one
            if ($('.running').length > 0) {
                //alert("found running cell");
                $('.running')[0].scrollIntoView();
            }}, 250);
        return false;
    }
});

Better Mindset

  • IMPORTANT: Frequently rewrite each cell logic into functions. These functions can be moved to separate .py files on regular intervals. Your notebook run should be mainly function calls.
    • This would prevent your notebook from becoming a giant pudding of several global variables
  • If particular cells take too long to run, add %%time cell magic as a warning + runtime logger
  • If you are on Py3.6+, please use f-strings! f"This is iteration: {iter_number}"is much more readable than .format() syntax
  • Any code that is used in more than 3 notebooks should be moved to .py files (such as utils.py) and imported such as from xxx_imports import *
  • Quite often, we frequently re-run same code cell. Instead, refactor that cell to a function and call that function repeatedly to prevent accidental edits
  • Use Pathlib instead of os.path wherever possible for more readable code. Here is a beginner friendly tutorial. If you just want to review, refer the crisp tutorial or official docs

Plotting and Visualization

  • Always have %matplotlib inline to ensure that the plots are rendered inside the notebook
  • Use separate plotting functions instead of repeating plt.plot code to avoid code bloating. Using subplots from Matplotlib OO API is usually neater than using more plt.plots
def show_img(im, figsize=None, ax=None, title=None):
    import matplotlib.pyplot as plt
    if not ax: fig,ax = plt.subplots(figsize=figsize)
    ax.imshow(im, cmap='gray')
    if title is not None: ax.set_title(title)
    ax.get_xaxis().set_visible(True)
    ax.get_yaxis().set_visible(True)
    return ax
    
def draw_rect(ax, bbox):
    import matplotlib.patches as patches
    x, y, w, h = bbox
    patch = ax.add_patch(patches.Rectangle((x, y), w,h, fill=False, edgecolor='red', lw=2))

show_img is a reusable plotting function which can be easily extended to plot one off images as well properly use subplots. In below example, I use a single figure and add new images as subplots using the neater axes.flat syntax:

fig, axes = plt.subplots(1, 2, figsize=(6, 2))
ax = show_img(char_img, ax= axes.flat[0], title = 'char_img_line_cropping:\n'+str(char_img.shape))
ax = show_img(char_bg_mask, ax=axes.flat[1], title = 'Bkg_mask:\n'+str(char_bg_mask.shape))

#  If you are working on image segmentation task, you can easily add red rectangles per subplot:
draw_rect(ax, char_bounding_boxes)  # will add red bounding boxes for each character

Please don't overdo Cell Magic

  • Don't use alias and alias_magic unless extremely helpful. Aliases make your code difficult to read for other developers
  • Don't leave %%timeit in your code. Why? Because it does 1,00,000 runs of the cell and then return average of best 3 runtimes. This is not always needed. Instead use %%time or add average times in inline comments

More Repositories

1

awesome-project-ideas

Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
7,659
star
2

NLP_Quickbook

NLP in Python with Deep Learning
Jupyter Notebook
563
star
3

hindi2vec

State-of-the-Art Language Modeling and Text Classification in Hindi Language
Jupyter Notebook
219
star
4

pytorch-web-deploy

Simple, fast web deployment for your PyTorch models
Python
70
star
5

agentai

Text to Python Objects via a LLM Function Call
Python
54
star
6

coronaIndia

Experiments & NLP Deployments for CoronaVirus Related Work
Jupyter Notebook
34
star
7

Hinglish

Hinglish Text Classification
Jupyter Notebook
30
star
8

breakoutlist-india

High potential opportunities for ambitious engineers, designers, data people and future founders. The best teams to join.
27
star
9

llama2demo

Python
14
star
10

Twitter-Geographical-Sentiment-Analysis

Finds the Happiest US and Indian State based on Sentimental Analysis of Twitter Data
Python
13
star
11

keras-practice

Notebooks covering Intro to CNN, Transfer Learning using VGG16
Jupyter Notebook
12
star
12

nirantk.github.io

Jupyter Notebook
8
star
13

Genetic-Algorithm-Self-Study-Notes

Notes, Reading Sources and Bibliography on Genetic Algorithms
8
star
14

qdrant_tools

Python Tools to use with the Qdrant Python Client
Jupyter Notebook
7
star
15

Text-Summarization

C
4
star
16

awesome-vectordb

Everything you need to decide and work with VectorDBs
Python
4
star
17

knee-xrays

Exploratory Repository
Jupyter Notebook
3
star
18

fitz-wrapper

CLI Utilities for PDF to Image Conversion, built with Py3
Python
3
star
19

OnDeckMLChallenge

Jupyter Notebook
3
star
20

fastvector

Python
3
star
21

DSA-BITS-Masti

Data Structures and Algorithms at BITS Pilani
C
3
star
22

experiments

Repository for Experimental Code
HTML
2
star
23

quickstart

Shell
2
star
24

comehomeandbuild

HTML
2
star
25

MITx-Analytics-Edge-Coursework

Code, Lecture Slides and Data from edx.org/course/analytics-edge-mitx-15-071x-0
HTML
2
star
26

cohere-learn

Utils which wrap around Cohere API: FewShotClassify and more coming soon
Python
1
star
27

Noor

Bringing Light to What We are Taught :)
HTML
1
star
28

interview_practice

Archive
C++
1
star
29

Aditi

1
star
30

latest-news-ncert

Link educational topics to latest NEWS
Python
1
star
31

julie

Julie is a blogging assistant and linter for AI Hackers wanting to make their work more accessible
Python
1
star
32

qdrant-course

Jupyter Notebook
1
star
33

CovidSeer

Complimentary Repo for Publishing Public facing Covid India work
Jupyter Notebook
1
star
34

go-demo

Demo code for the Golang lecture by @theonewolf
Go
1
star
35

bq

Binary Quantization in Numpy
1
star