phraug2
A new version of phraug (pron. frog) with improved command line arguments parsing, thanks to jofusa.
This is a set of simple Python scripts for pre-processing large files, things like splitting and format conversion. The names phraug comes from a great book, Made to Stick, by Chip and Dan Heath.
See http://fastml.com/processing-large-files-line-by-line/ for the basic idea.
There's always at least one input file and usually one or more output files. An input file always stays unchanged.
For documentation:
- try calling a script with
-h
, most will display usage information. - see the phraug docs.
- see http://fastml.com/introducing-phraug/
Example:
>python split.py
usage: split.py [-h] [-p PROBABILITY] [-r RANDOM_SEED] [-s] [-c]
input_file output_file1 output_file2
split.py: error: too few arguments
>python split.py -h
usage: split.py [-h] [-p PROBABILITY] [-r RANDOM_SEED] [-s] [-c]
input_file output_file1 output_file2
split a file into two randomly, line by line.
positional arguments:
input_file path to an input file
output_file1 path to the first output file
output_file2 path to the second output file
optional arguments:
-h, --help show this help message and exit
-p PROBABILITY, --probability PROBABILITY
probability of writing to the first file (default 0.9)
-r RANDOM_SEED, --random_seed RANDOM_SEED
random seed
-s, --skip_headers skip the header line
-c, --copy_headers copy the header line to both output files