Compress
This is a tool for automatically creating typing shortcuts from a corpus of your own writing! I use these shortcuts mainly for email and slack:
This repo parses a corpus of text and suggest what shortcuts you should use to save the most letters while typing. It then generates config files for Autokey, a linux program that implements keyboard shortcuts!
It also contains a tool for optionally parsing a Slack Data Export of your messages to create a corpus.
What phrases should I abbreviate?
The code looks through the corpus to find common n-grams that can be replaced with much shorter phrases. The suggestions are ranked by [characters saved] * [frequency of phrase]
.
I was surprised that very short and frequent words topped this list, such as the -> t
, instead of longer phrases that I use a lot, such as what do you think -> wdytk
.
Just reading through the results was amusing to see how repetitive some of my writing is :)
How to pick abbreviations?
This is largely preferences and heuristics to try to generate memorable abbreviations for different phrases. Some of my design philosphies were:
- The abbrev cannot be a word that I want to type. Right now this is done with a blacklist, but I should change it to use my actual corpus.
- The goal is being memorable. 1st letter is top choice, and 1st letter + last letter is next choice.
- More common phrases get priority for more memorable abbrevs.
This is currently done as a manual post-process step, but I like to make "families" of abbrevs to make them more memorable. Some example heuristics for this are:
- Plurals should have the same abbrev as the singular, but with an "s". For example
robot -> r
androbots -> rs
. - If a word has an abbrev, a phrase that contains that word should contain the abbrev. For example:
the -> t
robot -> r
the robot -> tr
- Think about how similar words' abbrevs can be similar as well. i.e.
some -> s
someone -> sn
something -> st
sometime -> sti
Instructions
- run
install.sh
to install dependencies. Currently tested on python 3.10.12 - Put any corpus of your text that you want to compress in
data/corpus/*.txt
- If you want to use your slack history as a corpus:
- export it to a folder called
data/slack_export
. Only slack workspace admins can do this (and it only exports public channels). - Change
USERNAME_TO_EXPORT
at the top of the file to your slack username. - Run
parse_slack.py
. This will generate a new corpus document indata/corpus/
- DELETE YOUR SLACK EXPORT WITH
srm
- export it to a folder called
- Run
find_suggested_phrases.py
. This will generate a list of the top 200 suggested shortcuts tooutput/suggested_shortcuts.yaml
- Edit or add any shortcuts that you want, then copy the file to
shortcuts.yaml
.- This is a manual step so you can customize it without it being blown out every time you run the script again.
- It's also saved in git even though it's an output so that I can keep it in sync across multiple of my computers :)
- If you're starting out, I suggest just going with 10-20 shortcuts to make it easier to remember them
- Run
generate_autokeys.py
to convertshortcuts.yaml
into actual config files forautokey
. - Install Autokey
- Right now, Autokey is only supported on linux with X11, not Wayland
- Check that your autokey config is located at
~/.config/autokey/data/My Phrases/
. If it is somewhere else, changereload.sh:8
to point to your config location - From now on when you edit
shortcuts.yaml
you can re-generate and reload autokey withreload.sh
Notes
Autokey Uses simulated keyboard input to replace phrases with your abbreviations. I tried several chrome extensions but this worked much more reliably without conflicting with sites' own javascript.
The config files I generate are set to only apply when Chrome is in focus because that's where I do most of my english typing. I found that keeping this active in terminal and vscode caused way more problems than it was solved because my abbreviations overlapped with common short linux commands and variable names i.e. t
.