You can read the introductory blog post or try it live at https://shrynk.ai
Features
- β Compress your data smartly based on Machine Learning
- β Takes User Requirements in the form of weights for
size
,write_time
andread_time
- β Trains & caches a model based on compression methods available in the system, using packaged data
- β CLI for compressing and decompressing
- β Works with
CSV
,JSON
andBytes
in general
CLI
shrynk compress myfile.json # will yield e.g. myfile.json.gz or myfile.json.bz2
shrynk decompress myfile.json.gz # will yield myfile.json
shrynk compress myfile.csv --size 0 --write 1 --read 0
shrynk benchmark myfile.csv # shows benchmark results
shrynk benchmark --predict myfile.csv # will also show the current prediction
shrynk benchmark --save --predict myfile.csv # will add the result to the training data too
Usage in Docker
To test shrynk out quickly yourself, you can use the official docker image from DockerHub. It is great not to interfere with an existing python installation.
You can also build the image from scratch by going to the docker folder here and doing docker build -t shrynk .
and use shrynk
instead of kootenpv/shrynk
above.
In the following commands, replace ~/Downloads
with the folder you want to share with the container (where the file you want to compress is).
# To see help
docker run --rm -v ~/.shrynk:/root/.shrynk -v ~/Downloads:/data kootenpv/shrynk shrynk --help
# To compress a file called train.csv in your ~/Downloads folder
docker run --rm -v ~/.shrynk:/root/.shrynk -v ~/Downloads:/data kootenpv/shrynk \
shrynk compress /data/train.csv
# To benchmark and predict the train.csv file in your ~/Downloads folder
docker run --rm -v ~/.shrynk:/root/.shrynk -v ~/Downloads:/data kootenpv/shrynk \
shrynk benchmark --predict /data/train.csv
Usage in Python
Installation:
pip install shrynk
Then in Python:
import pandas as pd
from shrynk import save, load
# save dataframe compressed
my_df = pd.DataFrame({"a": [1]})
file_path = save(my_df, "mypath.csv")
# e.g. mypath.csv.bz2
# load compressed file
loaded_df = load(file_path)
If you just want the prediction, you can also:
import pandas as pd
from shrynk import infer
infer(pd.DataFrame({"a": [1]}))
# {"engine": "csv", "compression": "bz2"}
Add your own data
If you want more control you can do the following:
import pandas as pd
from shrynk import PandasCompressor
df = pd.DataFrame({"a": [1, 2, 3]})
pdc = PandasCompressor("default")
pdc.run_benchmarks(df) # adds data to the default
pdc.train_model(size=3, write=1, read=1)
pdc.predict(df)