• Stars
    star
    406
  • Rank 106,421 (Top 3 %)
  • Language
    Python
  • License
    Other
  • Created over 4 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for the book "High Performance Python 2e" by Micha Gorelick and Ian Ozsvald with OReilly

High Performance Python 2e: The Code

This repository contains the code from "High Performance Python 2e" by Micha Gorelick and Ian Ozsvald with O'Reilly Media. Each directory contains the examples from the chapter in addition to other interesting code on the subject.

You can find out more about the authors here:

Errata

Errata can be filed here https://www.oreilly.com/cs/catalog/create/errata/?b=68228 (no login required, just a form with a few details) or you can check the confirmed errata here: https://www.oreilly.com/catalog/errata.csp?isbn=0636920268505 or file a bug on this repo, whatever's easiest.

Topics Covered

This book ranges in topic from native Python to external modules to writing your own modules. Code is shown to run on one CPU, multiple coroutines, multiple CPU's and multiple computers. In addition, throughout this exploration a focus is kept on keeping development time fast and learning from profiling output in order to direct optimizations.

The following topics are covered in the code repo:

  • Chapter 1: Understanding Performant Programming

    • How can I identify speed and RAM bottlenecks in my code?
    • How do I profile CPU and memory usage?
    • What depth of profiling should I use?
    • How can I profile a long-running application?
    • What's happening under the hood with CPython?
    • How do I keep my code correct while tuning performance?
  • Chapter 2: Profiling

    • What are the elements of a computer's architecture?
    • What are some common alternate computer architectures?
    • How does Python abstract the underlying computer architecture?
    • What are some of the hurdles to making performant Python code?
    • What strategies can help you become a highly performant programmer?
  • Chapter 3: Lists and Tuples

    • What are lists and tuples good for?
    • What is the complexity of a lookup in a list/tuple?
    • How is that complexity achieved?
    • What are the differences between lists and tuples?
    • How does appending to a list work?
    • When should I use lists and tuples?
  • Chapter 4: Dictionaries and Sets

    • What are dictionaries and sets good for?
    • How are dictionaries and sets the same?
    • What is the overhead when using a dictionary?
    • How can I optimize the performance of a dictionary?
    • How does Python use dictionaries to keep track of namespaces?
  • Chapter 5: Iterators

    • How do generators save memory?
    • When is the best time to use a generator?
    • How can I use +itertools+ to create complex generator workflows?
    • When is lazy evaluation beneficial, and when is it not?
  • Chapter 6: Matrix and Vector Computation

    • What are the bottlenecks in vector calculations?
    • What tools can I use to see how efficiently the CPU is doing my calculations?
    • Why is numpy better at numerical calculations than pure Python?
    • What are ++cache-miss++es and ++page-fault++s?
    • How can I track the memory allocations in my code?
    • How does Pandas work and how can I make it faster?
  • Chapter 7: Compiling to C

    • How can I have my Python code run at compiled speeds?
    • What is the difference between a JIT compiler and an AOT compiler?
    • What tasks can compiled Python code perform faster than native Python?
    • Why do type annotations speed up compiled Python code?
    • What is a GPU and how can I use it?
    • When are GPUs useful?
    • How can I write modules for Python using C or Fortran?
  • Chapter 8: Concurrency

    • What is concurrency and how is it helpful?
    • What is the difference between concurrency and parallelism?
    • How does async/await work?
    • Which tasks can be done concurrently and which can't?
    • When is the right time to take advantage of concurrency?
    • How can concurrency speed up my programs?
  • Chapter 9: Multiprocessing

    • What does the ++multiprocessing++ module offer?
    • What's the difference between processes and threads?
    • How do I choose the right size for a process pool?
    • How do I use nonpersistent queues for work processing?
    • What are the costs and benefits of interprocess communication?
    • How can I process ++numpy++ data with many CPUs?
    • How would I use Joblib to simplify parallelised and cached scientific work?
    • Why do I need locking to avoid data loss?
  • Chapter 10: Clusters and Job Queues

    • Why are clusters useful?
    • What are the costs of clustering?
    • How can I convert a multiprocessing solution into a clustered solution?
    • How does an IPython cluster work?
    • How can I parallelise Pandas using Dask and Swifter?
    • How does NSQ help with making robust production systems?
    • What is Docker and how can I use it in my workflow?
  • Chapter 11: Using Less Ram

    • Why should I use less RAM?
    • Why are numpy and array better for storing lots of numbers?
    • How can lots of text be efficiently stored in RAM?
    • How can I store huge volumes of text for machine learning when I don't have enough RAM?
    • When can sparse arrays beat normal dense arrays?
    • How could I count (approximately!) to 1076 using just 1 byte?
    • What is the landscape of Bloom Filters, HLL’s and KMVs?
    • When should I use a probabilistic datastructure?
  • Chapter 12: Lessons from the Field (no code)

    • Some stories from the field on performance python

Using the code base

This code base is a live document and should be freely commented on and used. It is distributed with a license that amounts to: don't use the code for profit, however read the provided license file for the law-jargon. Feel free to share, fork and comment on the code!

If any errors are found, or you have a bone to pick with how we go about doing things, leave an issue on this repo! Just keep in mind that all code was written for educational purposes and sometimes this means favouring readability over "the right thing" (although in Python these two things are generally one and the same!).

More Repositories

1

high_performance_python

Code for the book "High Performance Python" by Micha Gorelick and Ian Ozsvald with OReilly
Python
733
star
2

pyxmeans

Quick implementation of xmeans in python and C
C
86
star
3

gohll

An implementation of HLL++ in go
Go
69
star
4

fuggetaboutit

implementations of a counting bloom, a timing bloom and a scaling timing bloom... made for working with streams!
Python
42
star
5

timescope

rolling shutter and other fun in opencv
Python
37
star
6

realtimestream

Slides for the realtime stream processing tutorial at data gotham
Python
30
star
7

pytailcall

Crazy Python bytecode hacking for great tail call optimizations
Python
29
star
8

countmemaybe

A set of distinct value estimators that give probabilistic bounds on a sets cardinality
Python
22
star
9

baton

distributed federated learning for pytorch
Python
15
star
10

Shell-Config

My various config files for a proper dev enviroment
Vim Script
12
star
11

gocountme

LevelDB backed KMin Values database for quick and easy set operations.
Go
12
star
12

skycolor

API to get the color of the sky
Python
9
star
13

nanogenmo2015

Python
8
star
14

mrboterson

Simple framework for building slack bots in python
Python
8
star
15

cider-go

cider-go is a redis cluster proxy... allows you to use a fault tolerant redis cluster with your standard redis client libraries!
Go
7
star
16

beacon_locate

Triangulate phone location with fixed-location ibeacons
Python
7
star
17

gun_registry

Sketch of a privacy ensuring gun registry
Python
6
star
18

seamresize

Resize images by extracting out image seams
Python
6
star
19

2017-qcon-deeplearning

My 2016 QCon talk about deep learning and keras
Jupyter Notebook
5
star
20

gitcheck

Checks repositories for updates and shows nice bubble notification on updates (using libnotify)
Python
4
star
21

pypad

Experimental python editor
Python
4
star
22

waveequation

Putting a simple wave equation finite difference solver into BoxLib AMR
Fortran
3
star
23

interestigizer

make images automatically more interesting!
JavaScript
3
star
24

mynameisfiber.github.com

HTML
3
star
25

poopybutthole

automagically swap faces in a target image with candidate faces from a database
Python
2
star
26

showsnearme

This repo provides a python module and CLI script to poll and display shows from a variety of sources
Python
2
star
27

pyword2vec

python niceties for working with word2vec
Python
2
star
28

pymicha

Snippets of code I find useful and want around
Python
2
star
29

jackslinks

Non-blocking linked lists with O(1) move and O(num_cursors) updates in Golang
Go
2
star
30

musicplayground

Just playing around with sklearn and echonest
Python
2
star
31

chuckie

A python module capable of of solving GR with BSSN through a fortran interface
Fortran
2
star
32

vithub

vim plugin to navigate github pull requests
Vim Script
1
star
33

auto3d

Python
1
star
34

litter-box

C++
1
star
35

lpocolypse

Looking at the impact of the L train shutdown on communities around the five boroughs
Python
1
star
36

nymarkable

NYTimes for your reMarkable
Python
1
star
37

amr

AMR using tessellations
C
1
star
38

bitcpy

Bitly Copy-Paste-y will automatically bitly-ify all links in your clipboard!
Python
1
star
39

WorldVis

Visualization of bitly's keyword stream
C++
1
star
40

effapp

Project moved to https://github.com/doombeard/effapp
Python
1
star
41

modem_stats

Optimum refused to acknowledge I had an internet issue... I begged to differ
Jupyter Notebook
1
star
42

somaliaborders

Create animation of Somali borders from wikipedia maps
Shell
1
star
43

pktsne

keras + tsne + (numpy|generators)
Python
1
star
44

thefreedomfoundation

Website for The Freedom Foundation
JavaScript
1
star
45

pregnantpause

Extent the length of silence inside an audiotrack to help with transcribing
Python
1
star
46

bitlydataapi

bitly data API examples... see http://dev.bitly.com/data_apis.html for more information
Python
1
star
47

Outflow-Driven-Turbulence

Code for my undergrad thesis
Fortran
1
star
48

euler-problems

Solutions to euler project problems
Python
1
star
49

tennessee_plants

Organized data from the Tennessee Valley Authority about indigenous plants
JavaScript
1
star
50

Misc-Old-Projects

Just a small collection of my mini-projects from the past years (mostly my undergrad years). I may update the euler/primes/number section or maybe add little readme's, but that's about it!
C
1
star
51

coalesce

Coalesce Blog
1
star