• Stars
    star
    412
  • Rank 101,472 (Top 3 %)
  • Language
  • Created over 5 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Generator Tricks for Systems Programmers (Tutorial)

Generator Tricks for Systems Programming (Tutorial)

Copyright (C) 2008, 2018
David M. Beazley

Originally Presented at PyCon 2008, March 13, 2008, Chicago, Illinois.

Revised and updated for Python 3.7, October 29, 2018.

Introduction

This tutorial discusses various techniques for using generator functions and generator expressions in the context of systems programming. This topic loosely includes files, file systems, text parsing, network programming, and programming with threads.

Presentation Slides (PDF) Also available at https://speakerdeck.com/dabeaz/generator-tricks-for-systems-programmers-version-3-dot-0

Come to Chicago and take a Course.

Code Samples

The examples directory contains various code samples and data files used for the tutorial. The presentation slides also specify a filename. This tutorial assumes the use of Python 3.4 or newer.

All of the example programs should be executed within the examples directory. Certain programs might require additional support programs to be running. This is indicated in the description below.

Part 2 : Processing Data Files

  • nongenlog.py. Calculate the number of bytes transferred in an Apache server log using a simple for-loop. Does not use generators.
  • genlog.py. Calculate the number of bytes transferred in an Apache server log using a series of generator expressions.
  • makebig.py. Make a large access-log file for performance testing. This will create a file "big-access-log". For the numbers used in the presentation, I used python makebig.py 2000.

Part 3 : Fun with Files and Directories

  • genfind.py. An example of using pathlib and the rglob() method to yield filenames matching a given filename pattern.
  • genopen.py. A generator function that yields filenames matching a given filename pattern.
  • gencat.py. A generator function that concatenates a sequence of generators into a single sequence.
  • gengrep.py. A generator that greps a series of lines for those that match a regex pattern.

Part 4 : Parsing and Processing Data

  • bytesgen.py. Example that finds out how many bytes were transferred for a specific file in a whole directory of log files.
  • retuple.py. Parse a sequence of lines into a sequence of tuples using regular expressions.
  • redict.py. Parse a sequence of lines into a sequence of dictionaries with named fields.
  • fieldmap.py. Remap fields in a sequence of dictionaries.
  • linesdir.py. Generate lines from files in a directory.
  • apachelog.py. Parse an Apache log file.
  • query404.py. Find the set of all documents that are broken (404).
  • largefiles.py. Find all requests that transferred over a megabyte.
  • largest.py. Find the largest document.
  • hosts.py. Find unique host IP addresses.
  • downloads.py. Find number of downloads of a specific file.
  • robots.py. Find out who has been hitting robots.txt.

Part 5 : Processing Infinite Data

  • follow.py. Follow a log-file in real-time like tail -f in Unix. To run this program, you need to have a log-file to work with. Run the program runservers.py to start a simulated web-server. This will write a series of log lines for you to follow.
  • realtime404.py. Print all 404 requests as they happen in real-time on a log file.

Part 6 : Feeding the Pipeline

Part 7 : Extending the pipeline

  • genpickle.py. Turn sequences of objects into a sequence of pickles.
  • sendto.py. Send a sequence of items to a remote machine via a socket. Uses genpickle above.
  • receivefrom.py Receive a sequence of items from a socket. Uses genpickle above.
  • genqueue.py. Consume items on a queue.

Part 8 : Advanced Data Routing

  • genmulti.py. Generate items from more than one generator at once (multiplexing).
  • broadcast.py Broadcast a sequence of items to a collection of consumers.
  • netsend.py. Send items to another host on the network. Requires a receiver (use receivefrom.py above).
  • thrsend.py. Send items to multiple consumer threads.

Part 9 : Various Programming Tricks (And Debugging)

  • gentrace.py. Example of debugging a generator component.
  • storelast.py. Store the last value of a generator (for access later in the processing pipeline)
  • genshutdown.py. Simple example of shutting down a generator.
  • shutdownevt.py. Shutting down a generator with an event.

Part 10: Parsing and Printing

No sample programs.

Part 11 : Co-routines

Bug Reports

Bug reports and pull requests to the sample code are welcome. Comments to [email protected].

More Repositories

1

curio

Good Curio!
Python
3,990
star
2

python-cookbook

Code samples from the "Python Cookbook, 3rd Edition", published by O'Reilly & Associates, May, 2013.
Python
3,837
star
3

ply

Python Lex-Yacc
Python
2,681
star
4

sly

Sly Lex Yacc
Python
793
star
5

dataklasses

A different spin on dataclasses.
Python
783
star
6

bitey

Python
607
star
7

cluegen

Get a clue, get some code
Python
355
star
8

thredo

Thredo was an experiment - It's dead. Feel free to look around.
Python
338
star
9

blog

David Beazley's blog.
254
star
10

concurrencylive

Code from Concurrency Live - PyCon 2015
Python
154
star
11

python-distilled

Resources for Python Distilled (Book)
81
star
12

modulepackage

Materials for PyCon2015 Tutorial "Modules and Packages : Live and Let Die"
Python
73
star
13

wadze

Web Assembly Decoder - Zero Extras
Python
70
star
14

me-al

MeαΊ—al - The Decorator
Python
63
star
15

pythonprog

Python
61
star
16

typemap

Typemap - The Annotator (TM)
Python
55
star
17

pylox

Python implementation of the Lox language from Robert Nystrom's Crafting Interpreters
Python
46
star
18

flail

Ball and Chain Decorators
Python
41
star
19

asm6502

A small 6502 assembler written in Python
Python
20
star
20

ranet

An Implementation of Raft in Janet
18
star
21

raft_sep_19

Python
11
star
22

hoppy

9
star
23

raft_jun_2019

Raft Course 2019
Python
9
star
24

colonel

C Curio Kernel
Python
8
star
25

raft_dec19

Raft Project December 2019
Python
8
star
26

theater

Code for "The Problem with the Problem" talk, February 22, 2022.
Python
7
star
27

dabeaz.github.io

HTML
4
star
28

frothy

Frothy example from CSCI1730 Fall 2023
Racket
3
star
29

archive

Software Archive
3
star