• Stars
    star
    720
  • Rank 62,908 (Top 2 %)
  • Language
    TypeScript
  • License
    MIT License
  • Created almost 5 years ago
  • Updated 22 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An algorithm to optimize database queries that run multiple times https://pubkey.github.io/event-reduce/

Event-Reduce

An algorithm to optimize database queries that run multiple times




  • 1. You make a query to the database which returns the result in 100 milliseconds
  • 2. A write event occurs on the database and changes some data
  • 3. To get the new version of the query's results you now have three options:
    • a. Run the query over the database again which takes another 100 milliseconds
    • b. Write complex code that somehow merges the incoming event with the old state
    • c. Use Event-Reduce to calculate the new results on the CPU without disc-IO nearly instant


Efficiency

In the browser demo you can see that for randomly generated events, about 94% of them could be optimized by EventReduce. In real world usage, with non-random events, this can be even higher. For the different implementations in common browser databases, we can observe an up to 12 times faster displaying of new query results after a write occurred.

How they do it

EventReduce uses 19 different state functions to 'describe' an event+previousResults combination. A state function is a function that returns a boolean value like isInsert(), wasResultsEmpty(), sortParamsChanged() and so on.

Also there are 16 different action functions. An action function gets the event+previousResults and modifies the results array in a given way like insertFirst(), replaceExisting(), insertAtSortPosition(), doNothing() and so on.

For each of our 2^19 state combinations, we calculate which action function gives the same results that the database would return when the full query is executed again.

From this state-action combinations we create a big truth table that is used to create a binary decision diagram. The BDD is then optimized to call as few state functions as possible to determine the correct action of an incoming event-results combination.

The resulting optimized BDD is then shipped as the EventReduce algoritm and can be used in different programming languages and implementations. The programmer does not need to know about all this optimisation stuff and can directly use three simple functions like shown in the javascript implementation

When to use this

You can use this to

  • reduce the latency until a change to the database updates your application
  • make observing query results more scalable by doing less disk-io
  • reduce the bandwith when streaming realtime query results from the backend to the client
  • create a better form of caching where instead of invalidating the cache on write, you update its content

Limitations

  • EventReduce only works with queries that have a predictable sort-order for any given documents. (you can make any query predicable by adding the primary key as last sort parameter)

  • EventReduce can be used with relational databases but not on relational queries that run over multiple tables/collections. (you can use views as workarround so that you can query over only one table). In theory Event-Reduce could also be used for relational queries but I did not need this for now. Also it takes about one week on an average machine to run all optimizations, and having more state functions looks like an NP problem.

Implementations

At the moment there is only the JavaScript implementation that you can use over npm. Pull requests for other languages are welcomed.

Previous Work

FAQ

Is this something like materialized views? Yes and no. Materialized views solve a similar problem but in a different way with different trade-offs. When you have many users, all subscribing to different queries, you cannot create that many views because they are all recalculated on each write access to the database. EventReduce however has better scalability because it does not affect write performance and the calculation is done when the fresh query results are requested, not beforehand.
Is this something like event sourcing or CQRS? No, event sourcing is mostly used to calculate a current state by attaching the full event stream to the starting state. This allows for stuff like time travel and so on. EventReduce solves a completely different (performance-) problem and only shares some common keywords like event.
Isn't this optimization already done by database engines? No. I tested EventReduce with many common databases like MongoDB, MySQL and Postgres. Each of them had better performance with Event-Reduce then just observing the eventstream and running the queries again. If you understand what Event-Reduce exactly does, it comes clear that this optimization can not done by pull-based databases because they have missing information.
Isn't this the same as product XY? No. EventReduce is not a product, it is not comparable to any database or streaming backend out there. EventReduce is an algorithm with a specific input and output, nothing more, nothing less. You can use EventReduce without having to change your underlaying data infrastructure.

More Repositories

1

rxdb

A fast, local first, reactive Database for JavaScript Applications https://rxdb.info/
TypeScript
21,520
star
2

broadcast-channel

📡 BroadcastChannel to send data between different browser-tabs or nodejs-processes 📡 + LeaderElection over the channels https://pubkey.github.io/broadcast-channel/
JavaScript
1,845
star
3

eth-crypto

Cryptographic javascript-functions for ethereum and tutorials to use them with web3js and solidity
JavaScript
876
star
4

client-side-databases

An implementation of the exact same app in Firestore, AWS Datastore, PouchDB, RxDB and WatermelonDB
TypeScript
862
star
5

jsonschema-key-compression

Compress json-data based on its json-schema while still having valid json
TypeScript
96
star
6

unload

Run a piece of code when the javascript process stops. Works in all environments (browsers, nodejs..)
JavaScript
55
star
7

solidity-cli

Compile solidity-code faster, easier and more reliable
TypeScript
50
star
8

rxdb-quickstart

Local-First peer-to-peer replicated todo list with RxDB and WebRTC
TypeScript
41
star
9

fashion-segmentation

A tensorflow model for segmentation of fashion items out of multiple product images
Python
38
star
10

vscode-in-docker

Run VSCode inside of a Docker Container
Dockerfile
35
star
11

binary-decision-diagram

A library to create, minimize and optimize binary decision diagrams https://github.com/pubkey/binary-decision-diagram
TypeScript
31
star
12

async-test-util

Utility functions that are useful in async/await tests 👍
JavaScript
27
star
13

custom-idle-queue

Optimize the performance of important tasks by delaying background-tasks
JavaScript
19
star
14

atomjsIDE

The atom.io-IDE with all necessary plugins for fast javascript-programming. Build as a docker-container in linux.
Shell
13
star
15

array-push-at-sort-position

Push items to an array at their correct sort-position
JavaScript
8
star
16

javascript-vector-database

Local-First Vector Database with RxDB and transformers.js
TypeScript
8
star
17

oblivious-set

Like a JavaScript Set() but with a TTL for entries
TypeScript
7
star
18

sticky-load-balancer

NPM-Module for Nodejs to create a loadbalancer with a sticky-strategie.
JavaScript
4
star
19

secure-json-logic

Use logic-objects from uncertain sources and run them locally without breaking the own system
JavaScript
4
star
20

localstorage-indexeddb-cookies-opfs-sqlite-wasm

Localstorage vs. IndexedDB vs. Cookies vs. OPFS vs. Wasm-SQLite
TypeScript
3
star
21

indexeddb-performance-tests

Performance tests for IndexedDB use cases
TypeScript
3
star
22

rxdb-server

RxDB Server - https://rxdb.info/rx-server.html
TypeScript
3
star
23

dhbw-thesis-roter-grubert

Latex Vorlage fĂźr Bachelor- & Masterarbeiten an der DHBW nach dem Format wie im Buch "Roter Grubert" beschrieben
HTML
2
star