• Stars
    star
    109
  • Rank 319,077 (Top 7 %)
  • Language
    Python
  • Created about 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Using Machine Learning to learn how to Compress ⚑

Build Status PyPI PyPI HitCount

You can read the introductory blog post or try it live at https://shrynk.ai

Features

  • βœ“ Compress your data smartly based on Machine Learning
  • βœ“ Takes User Requirements in the form of weights for size, write_time and read_time
  • βœ“ Trains & caches a model based on compression methods available in the system, using packaged data
  • βœ“ CLI for compressing and decompressing
  • βœ“ Works with CSV, JSON and Bytes in general

CLI

shrynk compress myfile.json       # will yield e.g. myfile.json.gz or myfile.json.bz2
shrynk decompress myfile.json.gz  # will yield myfile.json

shrynk compress myfile.csv --size 0 --write 1 --read 0

shrynk benchmark myfile.csv                  # shows benchmark results
shrynk benchmark --predict myfile.csv        # will also show the current prediction
shrynk benchmark --save --predict myfile.csv # will add the result to the training data too

Usage in Docker

To test shrynk out quickly yourself, you can use the official docker image from DockerHub. It is great not to interfere with an existing python installation.

You can also build the image from scratch by going to the docker folder here and doing docker build -t shrynk . and use shrynk instead of kootenpv/shrynk above.

In the following commands, replace ~/Downloads with the folder you want to share with the container (where the file you want to compress is).

# To see help
docker run --rm -v ~/.shrynk:/root/.shrynk -v ~/Downloads:/data kootenpv/shrynk shrynk --help

# To compress a file called train.csv in your ~/Downloads folder
docker run --rm -v ~/.shrynk:/root/.shrynk -v ~/Downloads:/data kootenpv/shrynk \
   shrynk compress /data/train.csv

# To benchmark and predict the train.csv file in your ~/Downloads folder
docker run --rm -v ~/.shrynk:/root/.shrynk -v ~/Downloads:/data kootenpv/shrynk \
   shrynk benchmark --predict /data/train.csv

Usage in Python

Installation:

pip install shrynk

Then in Python:

import pandas as pd
from shrynk import save, load

# save dataframe compressed
my_df = pd.DataFrame({"a": [1]})
file_path = save(my_df, "mypath.csv")
# e.g. mypath.csv.bz2

# load compressed file
loaded_df = load(file_path)

If you just want the prediction, you can also:

import pandas as pd
from shrynk import infer

infer(pd.DataFrame({"a": [1]}))
# {"engine": "csv", "compression": "bz2"}

Add your own data

If you want more control you can do the following:

import pandas as pd
from shrynk import PandasCompressor

df = pd.DataFrame({"a": [1, 2, 3]})

pdc = PandasCompressor("default")
pdc.run_benchmarks(df) # adds data to the default

pdc.train_model(size=3, write=1, read=1)

pdc.predict(df)

More Repositories

1

whereami

Uses WiFi signals πŸ“Ά and machine learning to predict where you are
Python
5,100
star
2

yagmail

Send email in Python conveniently for gmail using yagmail
Python
2,639
star
3

neural_complete

A neural network trained to help writing neural network code using autocomplete
Python
1,152
star
4

gittyleaks

πŸ’§ Find sensitive information for a git repo
Python
741
star
5

sky

πŸŒ… next generation web crawling using machine intelligence
Python
328
star
6

contractions

Fixes contractions such as `you're` to `you are`
Python
308
star
7

access_points

Scan your WiFi and get access point information and signal quality
Python
187
star
8

textsearch

Find strings/words in text; convenience and C speed πŸŽ†
Python
126
star
9

brightml

Convenient Machine-Learned Auto Brightness (Linux)
Python
120
star
10

loco

Share localhost through SSH. Local/Remote port forwarding made safe and easy.
Python
106
star
11

cliche

Build a simple command-line interface from your functions πŸ’»
Python
105
star
12

tok

Fast and customizable tokenization 🚀
Python
64
star
13

just

Just is a wrapper to automagically read/write a file based on extension
Python
50
star
14

aserve

Easily mock an API β˜•
Python
50
star
15

spacy_api

Server/Client around Spacy to load spacy only once
Python
46
star
16

xtoy

Automated Machine Learning: go from 'X' to 'y' without effort.
Python
46
star
17

requests_viewer

View requests objects with style
Python
42
star
18

cant

For those who can't remember how to get a result
Python
34
star
19

aioyagmail

makes sending emails very easy by doing all the magic for you, asynchronously
Python
29
star
20

sysdm

Scripts as a service. Builds on systemd (for Linux)
Python
21
star
21

deep_eye2mouse

Move the mouse by your webcam + eyes
Python
20
star
22

reddit_ml_challenge

Reddit Machine Learning: Tagging Challenge
Python
19
star
23

inthenews.io

Get the latest and greatest in news (on Python)
CSS
19
star
24

crtime

Get creation time of files for any platform - no external dependencies ⏰
Python
16
star
25

natura

Find currencies / money talk in natural text
Python
15
star
26

rebrand

✨ Refactor your software using programming language independent, case-preserving string replacement πŸ’„
Python
15
star
27

emacs-kooten-theme

Dark color theme by kootenpv
Emacs Lisp
14
star
28

justdb

Just a thread/process-safe, file-based, fast, database.
Python
8
star
29

fastlang

Fast Detection of Language without Dependencies
Python
7
star
30

quickpip

A template for creating a quick, maintainable and high quality pypi project
Python
7
star
31

xdb

Ambition: Single API for any database in Python
Python
6
star
32

nostalgia_chrome

Self tracking your online life!
Python
5
star
33

cnn_basics

NLP using CNN on Cornell Movie Ratings
Python
4
star
34

kootenpv.github.io

Pascal van Kooten's website hosted on github.io
CSS
3
star
35

gittraffic

Save your gittrafic data so it won't get lost!
Python
3
star
36

flymake-solidity

flymake for solidity, using flymake-easy: live feedback on writing solidity contracts
Emacs Lisp
3
star
37

ppm

Safe password manager
C
2
star
38

automl_presentation

Example code for the presentation "Automated Machine Learning"
Python
2
star
39

dot_access

Makes nested python objects easy to go through
Python
1
star
40

feedview

View a feed url with `feedview <URL>`
Python
1
star
41

PassMan

android app for ppm
C
1
star
42

mockle

Automatic Mocking by Pickles
Python
1
star
43

emoji-picker

Python
1
star