• Stars
    star
    147
  • Rank 251,347 (Top 5 %)
  • Language
    Rust
  • License
    Apache License 2.0
  • Created about 6 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tool to compress some state in a Synapse instance's database

Compress Synapse State Tables

This workspace contains experimental tools that attempt to reduce the number of rows in the state_groups_state table inside of a Synapse Postgresql database.

Automated tool: synapse_auto_compressor

Introduction:

This tool is significantly more simple to use than the manual tool (described below). It scans through all of the rows in the state_groups database table from the start. When it finds a group that hasn't been compressed, it runs the compressor for a while on that group's room, saving where it got up to. After compressing a number of these chunks it stops, saving where it got up to for the next run of the synapse_auto_compressor.

It creates three extra tables in the database: state_compressor_state which stores the information needed to stop and start the compressor for each room, state_compressor_progress which stores the most recently compressed state group for each room and state_compressor_total_progress which stores how far through the state_groups table the compressor has scanned.

The tool can be run manually when you are running out of space, or be scheduled to run periodically.

Building

This tool requires cargo to be installed. See https://www.rust-lang.org/tools/install for instructions on how to do this.

This project follows the deprecation policy of Synapse on Rust and will assume a recent stable version of Rust and the ability to fetch a more recent one if necessary.

To build synapse_auto_compressor, clone this repository and navigate to the synapse_auto_compressor/ subdirectory. Then execute cargo build.

This will create an executable and store it in synapse_auto_compressor/target/debug/synapse_auto_compressor.

Example usage

$ synapse_auto_compressor -p postgresql://user:pass@localhost/synapse -c 500 -n 100

Running Options

  • -p [POSTGRES_LOCATION] Required The configuration for connecting to the Postgres database. This should be of the form "postgresql://username:[email protected]/database" or a key-value pair string: "user=username password=password dbname=database host=mydomain.com" See https://docs.rs/tokio-postgres/0.7.2/tokio_postgres/config/struct.Config.html for the full details.

  • -c [CHUNK_SIZE] Required The number of state groups to work on at once. All of the entries from state_groups_state are requested from the database for state groups that are worked on. Therefore small chunk sizes may be needed on machines with low memory. Note: if the compressor fails to find space savings on the chunk as a whole (which may well happen in rooms with lots of backfill in) then the entire chunk is skipped.

  • -n [CHUNKS_TO_COMPRESS] Required CHUNKS_TO_COMPRESS chunks of size CHUNK_SIZE will be compressed. The higher this number is set to, the longer the compressor will run for.

  • -d [LEVELS] Sizes of each new level in the compression algorithm, as a comma-separated list. The first entry in the list is for the lowest, most granular level, with each subsequent entry being for the next highest level. The number of entries in the list determines the number of levels that will be used. The sum of the sizes of the levels affects the performance of fetching the state from the database, as the sum of the sizes is the upper bound on the number of iterations needed to fetch a given set of state. [defaults to "100,50,25"]

Scheduling the compressor

The automatic tool may put some strain on the database, so it might be best to schedule it to run at a quiet time for the server. This could be done by creating an executable script and scheduling it with something like cron.

Manual tool: synapse_compress_state

Introduction

A manual tool that reads in the rows from state_groups_state and state_group_edges tables for a specified room and calculates the changes that could be made that (hopefully) will significantly reduce the number of rows.

This tool currently does not write to the database by default, so should be safe to run. If the -o option is specified then SQL will be written to the given file that would change the tables to match the calculated state. (Note that if -t is given then each change to a particular state group is wrapped in a transaction). If you do wish to send the changes to the database automatically then the -c flag can be set.

The SQL generated is safe to apply against the database with Synapse running. This is because the state_groups and state_groups_state tables are append-only: once written to the database, they are never modified. There is therefore no danger of a modification racing against a running Synapse. Further, this script makes its changes within atomic transactions, and each transaction should not affect the results from any of the queries that Synapse performs.

The tool will also ensure that the generated state deltas do give the same state as the existing state deltas before generating any SQL.

Building

This tool requires cargo to be installed. See https://www.rust-lang.org/tools/install for instructions on how to do this.

To build synapse_compress_state, clone this repository and then execute cargo build.

This will create an executable and store it in target/debug/synapse_compress_state.

Example usage

$ synapse_compress_state -p "postgresql://localhost/synapse" -r '!some_room:example.com' -o out.sql -t
Fetching state from DB for room '!some_room:example.com'...
Got initial state from database. Checking for any missing state groups...
Number of state groups: 73904
Number of rows in current table: 2240043
Number of rows after compression: 165754 (7.40%)
Compression Statistics:
  Number of forced resets due to lacking prev: 34
  Number of compressed rows caused by the above: 17092
  Number of state groups changed: 2748
New state map matches old one

# It's finished, so we can now go and rewrite the DB
$ psql synapse < out.data

Running Options

  • -p [POSTGRES_LOCATION] Required The configuration for connecting to the Postgres database. This should be of the form "postgresql://username:[email protected]/database" or a key-value pair string: "user=username password=password dbname=database host=mydomain.com" See https://docs.rs/tokio-postgres/0.7.2/tokio_postgres/config/struct.Config.html for the full details.

  • -r [ROOM_ID] Required The room to process (this is the value found in the rooms table of the database not the common name for the room - it should look like: "!wOlkWNmgkAZFxbTaqj:matrix.org".

  • -b [MIN_STATE_GROUP] The state group to start processing from (non-inclusive).

  • -n [GROUPS_TO_COMPRESS] How many groups to load into memory to compress (starting from the 1st group in the room or the group specified by -b).

  • -l [LEVELS] Sizes of each new level in the compression algorithm, as a comma-separated list. The first entry in the list is for the lowest, most granular level, with each subsequent entry being for the next highest level. The number of entries in the list determines the number of levels that will be used. The sum of the sizes of the levels affects the performance of fetching the state from the database, as the sum of the sizes is the upper bound on the number of iterations needed to fetch a given set of state. [defaults to "100,50,25"]

  • -m [COUNT] If the compressor cannot save this many rows from the database then it will stop early.

  • -s [MAX_STATE_GROUP] If a max_state_group is specified then only state groups with id's lower than this number can be compressed.

  • -o [FILE] File to output the SQL transactions to (for later running on the database).

  • -t If this flag is set then each change to a particular state group is wrapped in a transaction. This should be done if you wish to apply the changes while synapse is still running.

  • -c If this flag is set then the changes the compressor makes will be committed to the database. This should be safe to use while synapse is running as it wraps the changes to every state group in it's own transaction (as if the transaction flag was set).

  • -g If this flag is set then output the node and edge information for the state_group directed graph built up from the predecessor state_group links. These can be looked at in something like Gephi (https://gephi.org).

Running tests

There are integration tests for these tools stored in compressor_integration_tests/.

To run the integration tests, you first need to start up a Postgres database for the library to talk to. There is a docker-compose file that sets one up with all of the correct tables. The tests can therefore be run as follows:

$ cd compressor_integration_tests/
$ docker-compose up -d
$ cargo test --workspace
$ docker-compose down

Using the synapse_compress_state library

If you want to use the compressor in another project, it is recomended that you use jemalloc https://github.com/tikv/jemallocator.

To prevent the progress bars from being shown, use the no-progress-bars feature. (See synapse_auto_compressor/Cargo.toml for an example)

Troubleshooting

Connecting to database

From local machine

If you setup Synapse using the instructions on https://matrix-org.github.io/synapse/latest/postgres.html you should have a username and password to use to login to the postgres database. To run the compressor from the machine where Postgres is running, the url will be the following:

postgresql://synapse_user:synapse_password@localhost/synapse

From remote machine

If you wish to connect from a different machine, you'll need to edit your Postgres settings to allow remote connections. This requires updating the pg_hba.conf and the listen_addresses setting in postgresql.conf

Printing debugging logs

The amount of output the tools produce can be altered by setting the RUST_LOG environment variable to something.

To get more logs when running the synapse_auto_compressor tool try the following:

$ RUST_LOG=debug synapse_auto_compressor -p postgresql://user:pass@localhost/synapse -c 50 -n 100

If you want to suppress all the debugging info you are getting from the Postgres client then try:

RUST_LOG=synapse_auto_compressor=debug,synapse_compress_state=debug synapse_auto_compressor [etc.]

This will only print the debugging information from those two packages. For more info see https://docs.rs/env_logger/0.9.0/env_logger/.

Building difficulties

Building the openssl-sys dependency crate requires OpenSSL development tools to be installed, and building on Linux will also require pkg-config

This can be done on Ubuntu with: $ apt-get install libssl-dev pkg-config

Note that building requires quite a lot of memory and out-of-memory errors might not be obvious. It's recomended you only build these tools on machines with at least 2GB of RAM.

Auto Compressor skips chunks when running on already compressed room

If you have used the compressor before, with certain config options, the automatic tool will produce lots of warnings of the form: The compressor tried to increase the number of rows in ...

To fix this, ensure that the chunk_size is set to at least the L1 level size (so if the level sizes are "100,50,25" then the chunk_size should be at least 100).

Note: if the level sizes being used when rerunning are different to when run previously this might lead to less efficient compression and thus chunks being skipped, but this shouldn't be a large problem.

Compressor is trying to increase the number of rows

Backfilling can lead to issues with compression. The synapse_auto_compressor will skip chunks it can't reduce the size of and so this should help jump over the backfilled state_groups. Lots of state resolution might also impact the ability to use the compressor.

To examine the state_group hierarchy run the manual tool on a room with the -g option and look at the graphs.

More Repositories

1

synapse

Synapse: Matrix homeserver written in Python/Twisted.
Python
11,791
star
2

dendrite

Dendrite is a second-generation Matrix homeserver written in Go!
Go
4,696
star
3

matrix-js-sdk

Matrix Client-Server SDK for JavaScript
TypeScript
1,505
star
4

matrix-rust-sdk

Matrix Client-Server SDK for Rust
Rust
1,151
star
5

matrix-react-sdk

Matrix SDK for React Javascript
TypeScript
1,095
star
6

matrix-spec-proposals

Proposals for changes to the matrix specification
889
star
7

matrix-appservice-discord

A bridge between Matrix and Discord.
TypeScript
803
star
8

matrix.to

A simple stateless privacy-protecting URL redirecting service for Matrix
JavaScript
766
star
9

thirdroom

Open, decentralised, immersive worlds built on Matrix
C
599
star
10

matrix-appservice-irc

Node.js IRC bridge for Matrix
TypeScript
457
star
11

matrix-ios-sdk

The Matrix SDK for iOS
Objective-C
433
star
12

pinecone

Peer-to-peer overlay routing for the Matrix ecosystem
Go
428
star
13

matrix.org

matrix.org public website
JavaScript
413
star
14

matrix-android-sdk

The Matrix SDK for Android - DEPRECATED
Java
376
star
15

mjolnir

A moderation tool for Matrix
TypeScript
325
star
16

go-neb

Extensible matrix bot written in Go
Go
281
star
17

pantalaimon

E2EE aware proxy daemon for matrix clients.
Python
279
star
18

matrix-appservice-slack

A Matrix <--> Slack bridge
TypeScript
271
star
19

gomatrix

A Golang Matrix client
Go
269
star
20

sydent

Sydent: Reference Matrix Identity Server
Python
259
star
21

matrix-python-sdk

Matrix Client-Server SDK for Python 2 and 3
Python
256
star
22

sliding-sync

Proxy implementation of MSC3575's sync protocol.
Go
250
star
23

purple-matrix

Libpurple protocol plugin for matrix
C
224
star
24

matrix-ircd

An IRCd implementation backed by Matrix.
Rust
224
star
25

matrix-android-sdk2

Matrix SDK for Android, extracted from the Element Android application
Kotlin
189
star
26

matrix-spec

The Matrix protocol specification
HTML
188
star
27

matrix-hookshot

A bridge between Matrix and multiple project management services, such as GitHub, GitLab and JIRA.
TypeScript
185
star
28

vodozemac

An implementation of Olm and Megolm in pure Rust.
Rust
171
star
29

matrix-bifrost

General purpose bridging with a variety of backends including libpurple and xmpp.js
TypeScript
162
star
30

matrix-appservice-bridge

Bridging infrastructure for Application Services
TypeScript
141
star
31

matrix-ios-kit

Reusable UI interfaces to ease building of Matrix client apps
Objective-C
128
star
32

sygnal

Sygnal: reference Push Gateway for Matrix
Python
128
star
33

matrix-authentication-service

OAuth2.0 + OpenID Provider for Matrix Homeservers
Rust
111
star
34

matrix-synapse-ldap3

An LDAP3 auth provider for Synapse
Python
107
star
35

cerulean

An experimental Matrix client for playing with freestyle public threaded conversations
JavaScript
103
star
36

waterfall

A cascading stream forwarding unit for scalable, distributed voice and video conferencing over Matrix
Go
98
star
37

synapse-s3-storage-provider

Synapse storage provider to fetch and store media in Amazon S3
Python
92
star
38

matrix-rich-text-editor

Matrix Rich Text Editor
Rust
92
star
39

meshsim

Matrix mesh simulator
Python
90
star
40

matrix-static

A static golang generated preview of public world readable Matrix rooms.
Go
87
star
41

seshat

A Matrix message database/indexer
Rust
86
star
42

matrix-viewer

View the history of public and world readable Matrix rooms
JavaScript
74
star
43

matrix-appservice-node

Matrix Application Service framework in Node.js
TypeScript
71
star
44

sytest

Black-box integration testing for Matrix homeservers
Perl
66
star
45

matrix-federation-tester

Tester for matrix federation written in golang.
Go
61
star
46

complement

Matrix compliance test suite
Go
61
star
47

docker-jitsi

Docker files for building images and running jitsi-meet in Docker containers
Lua
58
star
48

matrix-widget-api

JavaScript/TypeScript API for widgets & web clients to communicate
TypeScript
57
star
49

gomatrixserverlib

Go library for matrix federation.
Go
56
star
50

olm

An implementation of the Double Ratchet cryptographic ratchet in C++/C
54
star
51

Matrix-NEB

N E Bot: Generic bot for Matrix with plugin support
Python
49
star
52

rust-opa-wasm

Open Policy Agent WebAssembly Rust SDK
Rust
46
star
53

naffka

Single in-process implementation of the sarama golang kafka APIs
Go
45
star
54

matrix-ios-console

The sample Matrix client for iOS
Objective-C
45
star
55

conference-bot

The conductor for your orchestra^Wconference
TypeScript
44
star
56

gsoc

JavaScript
43
star
57

matrix-appservice-gitter

Matrix <-> Gitter bridge
JavaScript
40
star
58

coap-proxy

HTTP<->CoAP proxy
Go
39
star
59

matrix-appservice-tg

Matrix<->Telegram user-puppeting portal
JavaScript
37
star
60

dendron

Dendron was an experimental Matrix homeserver, succeeded by Dendrite.
Go
35
star
61

matrix-vr-demo

Matrix.org Virtual Reality Demo
JavaScript
31
star
62

python-canonicaljson

Canonical JSON
Python
31
star
63

bullettime

An experimental golang Matrix homeserver
Go
31
star
64

matrix-angular-sdk

JavaScript
28
star
65

matrix-rust-components-swift

Swift package providing components from the matrix-rust-sdk
Swift
27
star
66

rageshake

Bug report server
Go
27
star
67

matrix-android-console

Java
26
star
68

matrix-android-sdk2-sample

Example project for using the android sdk
Kotlin
26
star
69

fed-tester-ui

UI for the matrix federation tester (forked from https://git.lain.haus/f0x/fed-tester)
JavaScript
26
star
70

lb

MSC3079 Low Bandwidth library for servers and clients
Go
25
star
71

prosody-mod-auth-matrix-user-verification

Matrix user verification auth for Prosody
Lua
24
star
72

voip-tester

Tests VoIP
JavaScript
23
star
73

thirdroom-unity-exporter

C#
23
star
74

matrix-user-verification-service

Service to verify details of a user based on a Open ID token.
JavaScript
23
star
75

matrix-search

A generic search engine daemon
Go
22
star
76

tardis

Time Agnostic Room DAG Inspection Service
JavaScript
21
star
77

synapse-auto-accept-invite

Synapse module to automatically accept invites
Python
18
star
78

libp2p-proxy

A p2p transport shim for p2p matrix.
Go
18
star
79

patience

Full stack integration testing for Matrix clients and servers
TypeScript
18
star
80

matrix-sentry-webhooks

Sentry webhooks integration bot for Matrix.
JavaScript
17
star
81

matrix-appservice-verto

A Matrix <--> Verto bridge, designed for conferencing
JavaScript
16
star
82

go-sqlite3-js

Go SQL driver for sqlite3 in browser (sql.js) from go-wasm
Go
15
star
83

matrix-appservice-rocketchat

JavaScript
15
star
84

matrix-content-scanner

[DEPRECATED] A web service for scanning media hosted by a Matrix media repository. Replaced by https://github.com/vector-im/matrix-content-scanner-python
JavaScript
13
star
85

synapse-user-restrictions

This module allows restricting users from performing actions such as creating rooms or sending invites.
Python
13
star
86

docker-dehydrated

A docker image we use internally for managing certificates.
Shell
13
star
87

matrix-rust-components-kotlin

Kotlin
12
star
88

matrix-rust-sdk-crypto-wasm

Rust
12
star
89

matrix-websockets-proxy

Websockets wrapper for matrix.org homeservers
Go
12
star
90

panopticon

panopticon records usage metrics from homeservers
Go
11
star
91

matrix-files-sdk

JS/TS SDK for working with files and folders in Matrix
TypeScript
11
star
92

remember-this-rs

A simple Rust crate to cache data both in-memory and on disk
Rust
11
star
93

complement-crypto

Go
10
star
94

allchange

A multi-project changelog generator
TypeScript
10
star
95

python-unpaddedbase64

Unpadded Base64
Python
10
star
96

synapse-email-account-validity

Account validity plugin for Synapse using email
Python
10
star
97

matrixmon

A small end-to-end prober and Prometheus stats exporter for a Matrix homeserver
Perl
10
star
98

matrix-synapse-saml-mozilla

Mozilla flavour of a Synapse SAML mapping provider
Python
9
star
99

vodozemac-bindings

Language bindings for vodozemac
Rust
9
star
100

synapse-config-generator

A web based synapse config generator
JavaScript
9
star