• Stars
    star
    1,252
  • Rank 37,538 (Top 0.8 %)
  • Language
    Python
  • License
    MIT License
  • Created about 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Human Programming Interface šŸ§‘šŸ‘½šŸ¤–

If youā€™re in a hurry, feel free to jump straight to the demos.

  • see SETUP for the installation/configuration guide
  • see DEVELOPMENT for the development guide
  • see DESIGN for the design goals
  • see MODULES for module-specific setup
  • see MODULE_DESIGN for some thoughts on structuring modules, and possibly extending HPI
  • see exobrain/HPI for some of my raw thoughts and todos on the project

TLDR: Iā€™m using HPI (Human Programming Interface) package as a means of unifying, accessing and interacting with all of my personal data.

HPI is a Python package (named my), a collection of modules for:

  • social networks: posts, comments, favorites
  • reading: e-books and pdfs
  • annotations: highlights and comments
  • todos and notes
  • health data: sleep, exercise, weight, heart rate, and other body metrics
  • location
  • photos & videos
  • browser history
  • instant messaging

The package hides the gory details of locating data, parsing, error handling and caching. You simply ā€˜importā€™ your data and get to work with familiar Python types and data structures.

  • Hereā€™s a short example to give you an idea: ā€œwhich subreddits I find the most interesting?ā€
    import my.reddit.all
    from collections import Counter
    return Counter(s.subreddit for s in my.reddit.all.saved()).most_common(4)
        
    orgmode62
    emacs60
    selfhosted51
    QuantifiedSelf46

I consider my digital trace an important part of my identity. (#extendedmind) Usually the data is siloed, accessing it is inconvenient and borderline frustrating. This feels very wrong.

In contrast, once the data is available as Python objects, I can easily plug it into existing tools, libraries and frameworks. It makes building new tools considerably easier and opens up new ways of interacting with the data.

I tried different things over the years and I think Iā€™m getting to the point where other people can also benefit from my code by ā€˜justā€™ plugging in their data, and thatā€™s why Iā€™m sharing this.

Imagine if all your life was reflected digitally and available at your fingertips. This library is my attempt to achieve this vision.

Table of contents:
  • Why?
  • How does a Python package help?
    • Why donā€™t you just put everything in a massive database?
  • Whatā€™s inside?
  • How do you use it?
  • Ad-hoc and interactive
    • What were my music listening stats for 2018?
    • What are the most interesting Slate Star Codex posts Iā€™ve read?
    • Accessing exercise data
    • Book reading progress
    • Messenger stats
    • Which month in 2020 did I make the most git commits in?
    • Querying Roam Research database
  • How does it get input data?
  • Q & A
    • Why Python?
    • Can anyone use it?
    • How easy is it to use?
    • What about privacy?
    • But should I use it?
    • Would it suit me?
    • What it isnā€™t?
  • HPI Repositories
  • Related links
  • ā€“

Why?

The main reason that led me to develop this is the dissatisfaction of the current situation:

  • Our personal data is siloed and trapped across cloud services and various devices

    Even when itā€™s possible to access it via the API, itā€™s hardly useful, unless youā€™re an experienced programmer, willing to invest your time and infrastructure.

  • We have insane amounts of data scattered across the cloud, yet weā€™re left at the mercy of those who collect it to provide something useful based on it

    Integrations of data across silo boundaries are almost non-existent. There is so much potential and itā€™s all wasted.

  • Iā€™m not willing to wait till some vaporware project reinvents the whole computing model from scratch

    As a programmer, I am in capacity to do something right now, even though itā€™s not necessarily perfect and consistent.

Iā€™ve written a lot about it here, so allow me to simply quote:

  • search and information access
    • Why canā€™t I search over all of my personal chat history with a friend, whether itā€™s ICQ logs from 2005 or Whatsapp logs from 2019?
    • Why canā€™t I have incremental search over my tweets? Or browser bookmarks? Or over everything Iā€™ve ever typed/read on the Internet?
    • Why canā€™t I search across my watched youtube videos, even though most of them have subtitles hence allowing for full text search?
    • Why canā€™t I see the places my friends recommended me on Google maps (or any other maps app)?
  • productivity
    • Why canā€™t my Google Home add shopping list items to Google Keep? Let alone other todo-list apps.
    • Why canā€™t I create a task in my todo list or calendar from a conversation on Facebook Messenger/Whatsapp/VK.com/Telegram?
  • journaling and history
    • Why do I have to lose all my browser history if I decide to switch browsers?
    • Why canā€™t I see all the places I traveled to on a single map and photos alongside?
    • Why canā€™t I see what my heart rate (i.e. excitement) and speed were side by side with the video I recorded on GoPro while skiing?
    • Why canā€™t I easily transfer all my books and metadata if I decide to switch from Kindle to PocketBook or vice versa?
  • consuming digital content
    • Why canā€™t I see stuff I highlighted on Instapaper as an overlay on top of web page?
    • Why canā€™t I have single ā€˜read it laterā€™ list, unifying all things saved on Reddit/Hackernews/Pocket?
    • Why canā€™t I use my todo app instead of ā€˜Watch laterā€™ playlist on youtube?
    • Why canā€™t I ā€˜followā€™ some user on Hackernews?
    • Why canā€™t I see if Iā€™ve run across a Youtube video because my friend sent me a link months ago?
    • Why canā€™t I have uniform music listening stats based on my Last.fm/iTunes/Bandcamp/Spotify/Youtube?
    • Why am I forced to use Spotifyā€™s music recommendation algorithm and donā€™t have an option to try something else?
    • Why canā€™t I easily see what were the books/music/art recommended by my friends or some specific Twitter/Reddit/Hackernews users?
    • Why my otherwise perfect hackernews app for Android doesnā€™t share saved posts/comments with the website?
  • health and body maintenance
    • Why canā€™t I tell if I was more sedentary than usual during the past week and whether I need to compensate by doing a bit more exercise?
    • Why canā€™t I see whatā€™s the impact of aerobic exercise on my resting HR?
    • Why canā€™t I have a dashboard for all of my health: food, exercise and sleep to see baselines and trends?
    • Why canā€™t I see the impact of temperature or CO2 concentration in room on my sleep?
    • Why canā€™t I see how holidays (as in, not going to work) impact my stress levels?
    • Why canā€™t I take my Headspace app data and see how/if meditation impacts my sleep?
    • Why canā€™t I run a short snippet of code and check some random health advice on the Internet against my health data.
  • personal finance
    • Why am I forced to manually copy transactions from different banking apps into a spreadsheet?
    • Why canā€™t I easily match my Amazon/Ebay orders with my bank transactions?
  • why I canā€™t do anything when Iā€™m offline or have a wonky connection?
  • tools for thinking and learning
    • Why when something like ā€˜mind palaceā€™ is literally possible with VR technology, we donā€™t see any in use?
    • Why canā€™t I easily convert select Instapaper highlights or new foreign words I encountered on my Kindle into Anki flashcards?
  • mediocre interfaces
    • Why do I have to suffer from poor management and design decisions in UI changes, even if the interface is not the main reason Iā€™m using the product?
    • Why canā€™t I leave priorities and notes on my saved Reddit/Hackernews items?
    • Why canā€™t I leave private notes on Deliveroo restaurants/dishes, so Iā€™d remember what to order/not to order next time?
    • Why do people have to suffer from Google Inbox shutdown?
  • communication and collaboration
    • Why canā€™t I easily share my web or book highlights with a friend? Or just make highlights in select books public?
    • Why canā€™t I easily find out other personā€™s expertise without interrogating them, just by looking what they read instead?
  • backups
    • Why do I have to think about it and actively invest time and effort?
  • Iā€™m tired of having to use multiple different messengers and social networks
  • Iā€™m tired of shitty bloated interfaces

    Why do we have to be at mercy of their developers, designers and product managers? If we had our data at hand, we could fine-tune interfaces for our needs.

  • Iā€™m tired of mediocre search experience

    Text search is something computers do exceptionally well. Yet, often itā€™s not available offline, itā€™s not incremental, everyone reinvents their own query language, and so on.

  • Iā€™m frustrated by poor information exploring and processing experience

    While for many people, services like Reddit or Twitter are simply time killers (and I donā€™t judge), some want to use them efficiently, as a source of information/research. Modern bookmarking experience makes it far from perfect.

You can dismiss this as a list of first-world problems, and you would be right, they are. But the major reason I want to solve these problems is to be better at learning and working with knowledge, so I could be better at solving the real problems.

How does a Python package help?

When I started solving some of these problems for myself, Iā€™ve noticed a common pattern: the hardest bit is actually getting your data in the first place. Itā€™s inherently error-prone and frustrating.

But once you have the data in a convenient representation, working with it is pleasant ā€“ you get to explore and build instead of fighting with yet another stupid REST API.

This package knows how to find data on your filesystem, deserialize it and normalize it to a convenient representation. You have the full power of the programming language to transform the data and do whatever comes to your mind.

Why donā€™t you just put everything in a massive database?

Glad youā€™ve asked! I wrote a whole post about it.

In short: while databases are efficient and easy to read from, often they arenā€™t flexible enough to fit your data. Youā€™re probably going to end up writing code anyway.

While working with your data, youā€™ll inevitably notice common patterns and code repetition, which youā€™ll probably want to extract somewhere. Thatā€™s where a Python package comes in.

Whatā€™s inside?

Hereā€™s the (incomplete) list of the modules:

=my.bluemaestro=Bluemaestro temperature/humidity/pressure monitor
=my.body.blood=Blood tracking (manual org-mode entries)
=my.body.exercise.all=Combined exercise data
=my.body.exercise.cardio=Cardio data, filtered from various data sources
=my.body.exercise.cross_trainer=My cross trainer exercise data, arbitrated from different sources (mainly, Endomondo and manual text notes)
=my.body.weight=Weight data (manually logged)
=my.calendar.holidays=Holidays and days off work
=my.coding.commits=Git commits data for repositories on your filesystem
=my.demo=Just a demo module for testing and documentation purposes
=my.emfit=Emfit QS sleep tracker
=my.endomondo=Endomondo exercise data
=my.fbmessenger=Facebook Messenger messages
=my.foursquare=Foursquare/Swarm checkins
=my.github.all=Unified Github data (merged from GDPR export and periodic API updates)
=my.github.gdpr=Github data (uses official GDPR export)
=my.github.ghexport=Github data: events, comments, etc. (API data)
=my.hypothesis=Hypothes.is highlights and annotations
=my.instapaper=Instapaper bookmarks, highlights and annotations
=my.kobo=Kobo e-ink reader: annotations and reading stats
=my.lastfm=Last.fm scrobbles
=my.location.google=Location data from Google Takeout
=my.location.home=Simple location provider, serving as a fallback when more detailed data isnā€™t available
=my.materialistic=Materialistic app for Hackernews
=my.orgmode=Programmatic access and queries to org-mode files on the filesystem
=my.pdfs=PDF documents and annotations on your filesystem
=my.photos.main=Photos and videos on your filesystem, their GPS and timestamps
=my.pinboard=Pinboard bookmarks
=my.pocket=Pocket bookmarks and highlights
=my.polar=Polar articles and highlights
=my.reddit=Reddit data: saved items/comments/upvotes/etc.
=my.rescuetime=Rescuetime (phone activity tracking) data.
=my.roamresearch=Roam data
=my.rss.all=Unified RSS data, merged from different services I used historically
=my.rss.feedbin=Feedbin RSS reader
=my.rss.feedly=Feedly RSS reader
=my.rtm=Remember The Milk tasks and notes
=my.runnerup=Runnerup exercise data (TCX format)
=my.smscalls=Phone calls and SMS messages
=my.stackexchange.gdpr=Stackexchange data (uses official GDPR export)
=my.stackexchange.stexport=Stackexchange data (uses API via stexport)
=my.taplog=Taplog app data
=my.time.tz.main=Timezone data provider, used to localize timezone-unaware timestamps for other modules
=my.time.tz.via_location=Timezone data provider, guesses timezone based on location data (e.g. GPS)
=my.twitter.all=Unified Twitter data (merged from the archive and periodic updates)
=my.twitter.archive=Twitter data (uses official twitter archive export)
=my.twitter.twint=Twitter data (tweets and favorites). Uses Twint data export.
=my.vk.vk_messages_backup=VK data (exported by Totktonada/vk_messages_backup)

Some modules are private, and need a bit of cleanup before merging:

my.workoutsExercise activity, from Endomondo and manual logs
my.sleep.manualSubjective sleep data, manually logged
my.nutritionFood and drink consumption data, logged manually from different sources
my.moneyExpenses and shopping data
my.webhistoryBrowsing history (part of promnesia)

How do you use it?

Mainly I use it as a data provider for my scripts, tools, and dashboards.

Also, check out my infrastructure map. It might be helpful for understanding whatā€™s my vision on HPI.

Instant search

Typical search interfaces make me unhappy as they are siloed, slow, awkward to use and donā€™t work offline. So I built my own ways around it! I write about it in detail here.

In essence, Iā€™m mirroring most of my online data like chat logs, comments, etc., as plaintext. I can overview it in any text editor, and incrementally search over all of it in a single keypress.

orger

orger is a tool that helps you generate an org-mode representation of your data.

It lets you benefit from the existing tooling and infrastructure around org-mode, the most famous being Emacs.

Iā€™m using it for:

  • searching, overviewing and navigating the data
  • creating tasks straight from the apps (e.g. Reddit/Telegram)
  • spaced repetition via org-drill

Orger comes with some existing modules, but it should be easy to adapt your own data source if you need something else.

I write about it in detail here and here.

promnesia

promnesia is a browser extension Iā€™m working on to escape silos by unifying annotations and browsing history from different data sources.

Iā€™ve been using it for more than a year now and working on final touches to properly release it for other people.

dashboard

As a big fan of #quantified-self, Iā€™m working on personal health, sleep and exercise dashboard, built from various data sources.

Iā€™m working on making it public, you can see some screenshots here.

timeline

Timeline is a #lifelogging project Iā€™m working on.

I want to see all my digital history, search in it, filter, easily jump at a specific point in time and see the context when it happened. That way it works as a sort of external memory.

Ideally, it would look similar to Andrew Louisā€™s Memex, or might even reuse his interface if he open sources it. I highly recommend watching his talk for inspiration.

Ad-hoc and interactive

What were my music listening stats for 2018?

Single import away from getting tracks you listened to:

from my.lastfm import scrobbles
list(scrobbles())[200: 205]
[Scrobble(raw={'album': 'Nevermind', 'artist': 'Nirvana', 'date': '1282488504', 'name': 'Drain You'}),
 Scrobble(raw={'album': 'Dirt', 'artist': 'Alice in Chains', 'date': '1282489764', 'name': 'Would?'}),
 Scrobble(raw={'album': 'Bob Dylan: The Collection', 'artist': 'Bob Dylan', 'date': '1282493517', 'name': 'Like a Rolling Stone'}),
 Scrobble(raw={'album': 'Dark Passion Play', 'artist': 'Nightwish', 'date': '1282493819', 'name': 'Amaranth'}),
 Scrobble(raw={'album': 'Rolled Gold +', 'artist': 'The Rolling Stones', 'date': '1282494161', 'name': "You Can't Always Get What You Want"})]

Or, as a pretty Pandas frame:

import pandas as pd
df = pd.DataFrame([{
    'dt': s.dt,
    'track': s.track,
} for s in scrobbles()]).set_index('dt')
df[200: 205]
                                                                       track
dt                                                                          
2010-08-22 14:48:24+00:00                                Nirvana ā€” Drain You
2010-08-22 15:09:24+00:00                           Alice in Chains ā€” Would?
2010-08-22 16:11:57+00:00                   Bob Dylan ā€” Like a Rolling Stone
2010-08-22 16:16:59+00:00                               Nightwish ā€” Amaranth
2010-08-22 16:22:41+00:00  The Rolling Stones ā€” You Can't Always Get What...

We can use calmap library to plot a github-style music listening activity heatmap:

import matplotlib.pyplot as plt
plt.figure(figsize=(10, 2.3))

import calmap
df = df.set_index(df.index.tz_localize(None)) # calmap expects tz-unaware dates
calmap.yearplot(df['track'], how='count', year=2018)

plt.tight_layout()
plt.title('My music listening activity for 2018')
plot_file = 'hpi_files/lastfm_2018.png'
plt.savefig(plot_file)
plot_file

https://beepb00p.xyz/hpi_files/lastfm_2018.png

This isnā€™t necessarily very insightful data, but fun to look at now and then!

What are the most interesting Slate Star Codex posts Iā€™ve read?

My friend asked me if I could recommend them posts I found interesting on Slate Star Codex. With few lines of Python I can quickly recommend them posts I engaged most with, i.e. the ones I annotated most on Hypothesis.

from my.hypothesis import pages
from collections import Counter
cc = Counter({(p.title + ' ' + p.url): len(p.highlights) for p in pages() if 'slatestarcodex' in p.url})
return cc.most_common(10)
The Anti-Reactionary FAQ http://slatestarcodex.com/2013/10/20/the-anti-reactionary-faq/32
Reactionary Philosophy In An Enormous, Planet-Sized Nutshell https://slatestarcodex.com/2013/03/03/reactionary-philosophy-in-an-enormous-planet-sized-nutshell/17
The Toxoplasma Of Rage http://slatestarcodex.com/2014/12/17/the-toxoplasma-of-rage/16
What Universal Human Experiences Are You Missing Without Realizing It? https://slatestarcodex.com/2014/03/17/what-universal-human-experiences-are-you-missing-without-realizing-it/16
Meditations On Moloch http://slatestarcodex.com/2014/07/30/meditations-on-moloch/12
Universal Love, Said The Cactus Person http://slatestarcodex.com/2015/04/21/universal-love-said-the-cactus-person/11
Untitled http://slatestarcodex.com/2015/01/01/untitled/11
Considerations On Cost Disease https://slatestarcodex.com/2017/02/09/considerations-on-cost-disease/10
In Defense of Psych Treatment for Attempted Suicide http://slatestarcodex.com/2013/04/25/in-defense-of-psych-treatment-for-attempted-suicide/9
I Can Tolerate Anything Except The Outgroup https://slatestarcodex.com/2014/09/30/i-can-tolerate-anything-except-the-outgroup/9

Accessing exercise data

E.g. see use of my.workouts here.

Book reading progress

I publish my reading stats on Goodreads so other people can see what Iā€™m reading/have read, but Kobo lacks integration with Goodreads. Iā€™m using kobuddy to access my my Kobo data, and Iā€™ve got a regular task that reminds me to sync my progress once a month.

The task looks like this:

* TODO [#C] sync [[https://goodreads.com][reading progress]] with kobo
  DEADLINE: <2019-11-24 Sun .+4w -0d>
[[eshell: python3 -c 'import my.kobo; my.kobo.print_progress()']]

With a single Enter keypress on the inlined eshell: command I can print the progress and fill in the completed books on Goodreads, e.g.:

A_Mathematician's_Apology by G. H. Hardy
Started : 21 Aug 2018 11:44
Finished: 22 Aug 2018 12:32

Fear and Loathing in Las Vegas: A Savage Journey to the Heart of the American Dream (Vintage) by Thompson, Hunter S.
Started : 06 Sep 2018 05:54
Finished: 09 Sep 2018 12:21

Sapiens: A Brief History of Humankind by Yuval Noah Harari
Started : 09 Sep 2018 12:22
Finished: 16 Sep 2018 07:25

Inadequate Equilibria: Where and How Civilizations Get Stuck by Eliezer Yudkowsky
Started : 31 Jul 2018 22:54
Finished: 16 Sep 2018 07:25

Albion Dreaming by Andy Roberts
Started : 20 Aug 2018 21:16
Finished: 16 Sep 2018 07:26

Messenger stats

How much do I chat on Facebook Messenger?
from my.fbmessenger import messages

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'dt': m.dt, 'messages': 1} for m in messages())
df.set_index('dt', inplace=True)

df = df.resample('M').sum() # by month
df = df.loc['2016-01-01':'2019-01-01'] # past subset for determinism

fig, ax = plt.subplots(figsize=(15, 5))
df.plot(kind='bar', ax=ax)

# todo wonder if that vvv can be less verbose...
x_labels = df.index.strftime('%Y %b')
ax.set_xticklabels(x_labels)

plot_file = 'hpi_files/messenger_2016_to_2019.png'
plt.tight_layout()
plt.savefig(plot_file)
return plot_file

https://beepb00p.xyz/hpi_files/messenger_2016_to_2019.png

Which month in 2020 did I make the most git commits in?

If you like the shell or just want to quickly convert/grab some information from HPI, it also comes with a JSON query interface - so you can export the data, or just pipeline to your heartā€™s content:

$ hpi query my.coding.commits.commits --stream # stream JSON objects as they're read
  --order-type datetime  # find the 'datetime' attribute and order by that
  --after '2020-01-01' --before '2021-01-01' # in 2020
  | jq '.committed_dt' -r  # extract the datetime
  # mangle the output a bit to group by month and graph it
  | cut -d'-' -f-2 | sort | uniq -c | awk '{print $2,$1}' | sort -n | termgraph
2020-01: ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ 458.00
2020-02: ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ 440.00
2020-03: ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ 545.00
2020-04: ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ 585.00
2020-05: ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ 518.00
2020-06: ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ 755.00
2020-07: ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ 467.00
2020-08: ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ 449.00
2020-09: ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ 1.03 K
2020-10: ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ 791.00
2020-11: ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ 474.00
2020-12: ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ā–‡ 383.00

See query docs for more examples

Querying Roam Research database

Iā€™ve got some code examples here.

How does it get input data?

If youā€™re curious about any specific data sources Iā€™m using, Iā€™ve written it up in detail.

Also see ā€œData flowā€ documentation with some nice diagrams explaining on specific examples.

In short:

  • The data is periodically synchronized from the services (cloud or not) locally, on the filesystem

    As a result, you get JSONs/sqlite (or other formats, depending on the service) on your disk.

    Once you have it, itā€™s trivial to back it up and synchronize to other computers/phones, if necessary.

    To schedule periodic sync, Iā€™m using cron.

  • my. package only accesses the data on the filesystem

    That makes it extremely fast, reliable, and fully offline capable.

As you can see, in such a setup, the data is lagging behind the ā€˜realtimeā€™. I consider it a necessary sacrifice to make everything fast and resilient.

In theory, itā€™s possible to make the system almost realtime by having a service that sucks in data continuously (rather than periodically), but itā€™s harder as well.

Q & A

Why Python?

I donā€™t consider Python unique as a language suitable for such a project. It just happens to be the one Iā€™m most comfortable with. I do have some reasons that I think make it specifically good, but explaining them is out of this postā€™s scope.

In addition, Python offers a very rich ecosystem for data analysis, which we can use to our benefit.

That said, Iā€™ve never seen anything similar in other programming languages, and I would be really interested in, so please send me links if you know some. Iā€™ve heard LISPs are great for data? ;)

Overall, I wish FFIs were a bit more mature, so we didnā€™t have to think about specific programming languages at all.

Can anyone use it?

Yes!
  • you can plug in your own data
  • most modules are isolated, so you can only use the ones that you want to
  • everything is easily extensible

    Starting from simply adding new modules to any dynamic hackery you can possibly imagine within Python.

How easy is it to use?

The whole setup requires some basic programmer literacy:
  • installing/running and potentially modifying Python code
  • using symlinks
  • potentially running Cron jobs

If you have any ideas on making the setup simpler, please let me know!

What about privacy?

The modules contain no data, only code to operate on the data.

Everything is *local first*, the input data is on your filesystem. If youā€™re truly paranoid, you can even wrap it in a Docker container.

There is still a question of whether you trust yourself at even keeping all the data on your disk, but it is out of the scope of this post.

If youā€™d rather keep some code private too, itā€™s also trivial to achieve with a private subpackage.

But should I use it?

Sure, maybe you can achieve a perfect system where you can instantly find and recall anything that youā€™ve done. Do you really want it? Wouldnā€™t that, like, make you less human?

Iā€™m not a gatekeeper of what it means to be human, but I donā€™t think that the shortcomings of the human brain are what makes us such.

So I canā€™t answer that for you. I certainly want it though. Iā€™m quite open about my goals ā€“ Iā€™d happily get merged/augmented with a computer to enhance my thinking and analytical abilities.

While at the moment we donā€™t even remotely understand what would such merging or ā€œmind uploadingā€ entail exactly, I can clearly delegate some tasks, like long term memory, information lookup, and data processing to a computer. They can already handle it really well.

What about these people who have perfect recall and wish they hadnā€™t.

Sure, maybe it sucks. At the moment though, my recall is far from perfect, and this only annoys me. I want to have a choice at least, and digital tools give me this choice.

Would it suit me?

Probably, at least to some extent.

First, our lives are different, so our APIs might be different too. This is more of a demonstration of whatā€™s Iā€™m using, although I did spend effort towards making it as modular and extensible as possible, so other people could use it too. Itā€™s easy to modify code, add extra methods and modules. You can even keep all your modifications private.

But after all, weā€™ve all sharing many similar activities and using the same products, so there is a huge overlap. Iā€™m not sure how far we can stretch it and keep modules generic enough to be used by multiple people. But letā€™s give it a try perhaps? :)

Second, interacting with your data through the code is the central idea of the project. That kind of cuts off people without technical skills, and even many people capable of coding, who dislike the idea of writing code outside of work.

It might be possible to expose some no-code interfaces, but I still feel that wouldnā€™t be enough.

Iā€™m not sure whether itā€™s a solvable problem at this point, but happy to hear any suggestions!

What it isnā€™t?

  • Itā€™s not vaporware

    The project is a little crude, but itā€™s real and working. Iā€™ve been using it for a long time now, and find it fairly sustainable to keep using for the foreseeable future.

  • Itā€™s not going to be another silo

    While I donā€™t have anything against commercial use (and I believe any work in this area will benefit all of us), Iā€™m not planning to build a product out of it.

    I really hope it can grow into or inspire some mature open source system.

    Please take my ideas and code and build something cool from it!

HPI Repositories

One of HPIā€™s core goals is to be as extendable as possible. The goal here isnā€™t to become a monorepo and support every possible data source/website to the point that this isnā€™t maintainable anymore, but hopefully you get a few modules ā€˜for freeā€™.

If you want to write modules for personal use but donā€™t want to merge them into here, youā€™re free to maintain modules locally in a separate directory to avoid any merge conflicts, and entire HPI repositories can even be published separately and installed into the single my python package (For more info on this, see MODULE_DESIGN)

Other HPI Repositories:

If you want to create your own to create your own modules/override something here, you can use the template.

Related links

Similar projects:

Other links:

ā€“

Open to any feedback and thoughts!

Also, donā€™t hesitate to raise an issue, or reach me personally if you want to try using it, and find the instructions confusing. Your questions would help me to make it simpler!

In some near future I will write more about:

  • specific technical decisions and patterns
  • challenges I had so solve
  • more use-cases and demos ā€“ itā€™s impossible to fit everything in one post!

, but happy to answer any questions on these topics now!

More Repositories

1

promnesia

Another piece of your extended mind
Python
1,534
star
2

cloudmacs

Selfhost your Emacs and access it in browser
Shell
431
star
3

orgparse

Python module for reading Emacs org-mode files
Python
323
star
4

grasp

A reliable org-capture browser extension for Chrome/Firefox
JavaScript
286
star
5

orger

Tool to convert data into searchable and interactive org-mode views
Python
274
star
6

cachew

Transparent and persistent cache/serialization powered by type hints
Python
189
star
7

pockexport

Export/access your Pocket data, including highlights!
Python
148
star
8

rexport

Reddit takeout: export your account data as JSON: comments, submissions, upvotes etc. šŸ¦–
Python
137
star
9

kobuddy

Kobo database backup and parser: extracts notes, highlights, reading progress and more
Python
104
star
10

beepb00p

My blog!
Python
73
star
11

exobrain

My external brain šŸ§ 
67
star
12

fbmessengerexport

Export/access you Messenger/Facebook chat messages
Python
60
star
13

ghexport

Export your Github activity: events, repositories, stars, etc.
Python
39
star
14

myinfra

A diagram of my personal infrastructure
Python
38
star
15

dron

What if cron and systemd had a baby?
Python
35
star
16

hypexport

Export/access your Hypothes.is data: annotations and profile info
Python
33
star
17

spotifyexport

Export your personal Spotify data: playlists, saved tracks/albums/shows, etc. as JSON
Python
29
star
18

instapexport

Export your personal Instapaper data: bookmarked articles and highlights
Python
27
star
19

arctee

Atomic tee
Python
24
star
20

telegram-backup-to-txt

Tool to dump telegram into text files for quick search (e.g. with grep)
Python
23
star
21

axol

Personal news feed: search for results on Reddit/Pinboard/Twitter/Hackernews and read as RSS
Python
23
star
22

pinbexport

Export your bookmarks from Pinboard
Python
22
star
23

inorganic

Convert python structures into org-mode
Python
21
star
24

dashboard

Python
18
star
25

goodrexport

Goodreads data export
Python
17
star
26

dotemacs

Emacs config (Doom/Spacemacs) + supplementary files and scripts
Emacs Lisp
14
star
27

telegram2org

Tool to create TODO tasks from Telegram messages in one tap
Python
14
star
28

bleanser

Tool for cleaning old and redundant backups
Python
11
star
29

blinkist-backup

Extract your blinkist hightlights and library books
Python
11
star
30

open-in-editor

Script to jump into files in you text editor, from your web browser
Python
9
star
31

exobrain-md

8
star
32

stexport

Export and access your Stackexchange data
Python
8
star
33

rescuexport

Export/access your Rescuetime data
Python
8
star
34

vkdump

Script for VK.com backup
Python
7
star
35

sufs

Merge multiple directories into one via symlinks
Python
7
star
36

scrapyroo

Full text search over deliveroo restaurants
JavaScript
6
star
37

exobrain-compiler

Scripts I'm using to generate my exobrain
Emacs Lisp
6
star
38

cofunctor-pl

Haskell
5
star
39

autohash

AutoValue extension which speeds up `hashCode` calculation for immutable objects
Java
5
star
40

monzoexport

Tool to export your Monzo transactions
Python
4
star
41

checker-fenum-android-demo

Demo setup for using Checker Framework custom @Fenum annotations in your Android project
Java
4
star
42

pymplate

My Python project template
Python
4
star
43

emfitexport

Python
4
star
44

exports

Various data export scripts that don't deserve a separate repository yet
Python
3
star
45

endoexport

Export/access your Endomondo data
Python
3
star
46

hpi-personal-overlay

Python
3
star
47

mreddit

Simple script to check whether some of your subreddits are not in a multireddit
Python
3
star
48

nordvpn-kill-switch

Tool to prevent DNS leaks. Discontinued in favor of jotyGill/openpyn-nordvpn
Shell
3
star
49

beepb00p-raw

Raw plaintext export of my blog posts
3
star
50

qm-reverse-engineering

Reverse engineering quantified-mind.com
Python
2
star
51

RobolectricPowermock

Java
2
star
52

hpi_fission_talk

Talk on 20210422
CSS
2
star
53

bt-wifi-reconnect

Make BT Wifi great again
Python
2
star
54

kython

A collection of common python stuff I use
Python
2
star
55

scripts

Various personal scripts
Python
2
star
56

porg

Library for xpath-like org-mode queries
Python
2
star
57

haveibeenpwned

Script to track changes on haveibeenpwned (discontinued in favor of https://monitor.firefox.com)
Python
2
star
58

ruci

Rust
1
star
59

karlicoss.github.io

Staging for my blog
HTML
1
star
60

mydata

Public bits of my personal data
1
star
61

gcal-quickeradd

Python
1
star
62

lagrangians

Jupyter Notebook
1
star
63

my-awesome-list

Awesome stuff I am using
1
star
64

masters-thesis

TeX
1
star
65

exobrain-logseq

CSS
1
star
66

exporthelpers

Python
1
star
67

hsbc-parser

Extract transaction data from HSBC credit card PDF statements
Python
1
star
68

dominatepp

Dominate++
Python
1
star
69

hypothesis-top-annotators

Python
1
star
70

scrapyroo-slides

CSS
1
star
71

android-template

My empty Android project template
Groovy
1
star
72

python_duplicate_warnings_investigation

Python
1
star
73

promnesia-demos

Binary assets for Promnesia
1
star
74

.emacs.d

My emacs config
Emacs Lisp
1
star
75

backup-trees

Python
1
star
76

rtm-backup

Script to backup your Remember The Milk account data
Python
1
star
77

dropbox-paranoid

Tool to detect Dropbox conflicts and prevent symlink mess
Python
1
star
78

syncthing-paranoid

Python
1
star