• Stars
    star
    297
  • Rank 140,075 (Top 3 %)
  • Language
    Shell
  • Created over 10 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Docker Compose project for the MusicBrainz Server with replication, search, and development setup

MusicBrainz mirror server with search and replication

Build Status

This repo contains everything needed to run a musicbrainz mirror server with search and replication in docker.

Table of contents

Prerequisites

Recommended hardware/VM

  • CPU: 16 threads (or 2 without indexed search), x86-64 architecture
  • RAM: 16 GB (or 4 without indexed search)
  • Disk Space: 200 GB (or 100 without indexed search)

Required software

If you use Docker Desktop on macOS you may need to increase the amount of memory available to containers from the default of 2GB:

  • Preferences > Resources > Memory

If you use Ubuntu 19.10 or later, the above requirements can be set up by running:

sudo apt-get update && \
sudo apt-get install docker.io docker-compose git && \
sudo systemctl enable --now docker.service

If you use UFW to manage your firewall:

  • ufw-docker or any other way to fix the Docker and UFW security flaw.

External documentation

Components version

  • Current MB Branch: v-2023-08-07
  • Current DB_SCHEMA_SEQUENCE: 28
  • Postgres Version: 12 (can be changed by setting the environment variable POSTGRES_VERSION)
  • MB Solr search server: 3.4.2 (can be changed by setting the environment variable MB_SOLR_VERSION)
  • Search Index Rebuilder: 3.0.1

Installation

This section is about installing MusicBrainz mirror server with locally indexed search and automatically replicated data.

Download this repository and change current working directory with:

git clone https://github.com/metabrainz/musicbrainz-docker.git
cd musicbrainz-docker

If you want to mirror the Postgres database only (neither the website nor the web API), change the base configuration with the following command (as a first step, otherwise it will blank it out):

admin/configure with alt-db-only-mirror

Build Docker images

Docker images for composed services should be built once using:

sudo docker-compose build

Create database

⚙️ Postgres shared buffers are set to 2GB by default. Before running this step, you should consider modifying your memory settings in order to give your database a sufficient amount of ram, otherwise your database could run very slowly.

Download latest full data dumps and create the database with:

sudo docker-compose run --rm musicbrainz createdb.sh -fetch

Build materialized tables

This is an optional step.

MusicBrainz Server makes use of materialized (or denormalized) tables in production to improve the performance of certain pages and features. These tables duplicate primary table data and can take up several additional gigabytes of space, so they're optional but recommended. If you don't populate these tables, the server will generally fall back to slower queries in their place.

If you wish to configure the materialized tables, you can run:

sudo docker-compose exec musicbrainz bash -c './admin/BuildMaterializedTables --database=MAINTENANCE all'

Start website

Make the local website available at http://localhost:5000 with:

sudo docker-compose up -d

At this point the local website will show data loaded from the dumps only. For indexed search and replication, keep going!

Set up search indexes

Depending on your available ressources in CPU/RAM vs. bandwidth:

  • Either build search indexes manually from the installed database:

    sudo docker-compose exec indexer python -m sir reindex

    ⚙️ Java heap for Solr is set to 2GB by default. Before running this step, you should consider modifying your memory settings in order to give your search server a sufficient amount of ram, otherwise your search server could run very slowly.

    (This option is known to take 4½ hours with 16 CPU threads and 16 GB RAM.)

    To index cores individually, rather than all at once, add --entity-type CORE (any number of times) to the command above. For example sudo docker-compose exec indexer python -m sir reindex --entity-type artist --entity-type release

  • Or download pre-built search indexes based on the latest data dump:

    sudo docker-compose run --rm musicbrainz fetch-dump.sh search
    sudo docker-compose run --rm search load-search-indexes.sh

    (This option downloads 30GB of Zstandard-compressed archives from FTP.)

⚠️ Search indexes are not included in replication. You will have to rebuild search indexes regularly to keep it up-to-date. This can be done manually with the commands above, with Live Indexing (see below), or with a scheduled cron job. Here's an example cron job that can be added to your etc/crontab file from your server's root:

0 1 * * 7 YOUR_USER_NAME cd ~/musicbrainz-docker && /usr/bin/docker-compose exec -T indexer python -m sir reindex

At this point indexed search works on the local website/webservice. For replication, keep going!

Enable replication

Set replication token

First, copy your MetaBrainz access token (see instructions for generating a token) and paste when prompted to by the following command:

admin/set-replication-token

The token will be written to the file local/secrets/metabrainz_access_token.

Then, grant access to the token for replication with:

admin/configure add replication-token
sudo docker-compose up -d

Run replication once

Run replication script once to catch up with latest database updates:

sudo bash -c 'docker-compose exec musicbrainz replication.sh &' && \
sudo docker-compose exec musicbrainz /usr/bin/tail -f mirror.log

Schedule replication

Enable replication as a cron job of root user in musicbrainz service container with:

admin/configure add replication-cron
sudo docker-compose up -d

By default, it replicates data every day at 3 am UTC. To change that, see advanced configuration.

You can view the replication log file while it is running with:

sudo docker-compose exec musicbrainz tail --follow mirror.log

You can view the replication log file after it is done with:

sudo docker-compose exec musicbrainz tail mirror.log.1

Enable live indexing

⚠️ Search indexes’ live update for mirror server is not stable yet. Until then, it should be considered as an experimental feature. Do not use it if you don't want to get your hands dirty.

  1. Disable replication cron job if you enabled it:

    admin/configure rm replication-cron
    sudo docker-compose up -d
  2. Make indexer goes through AMQP Setup with:

    sudo docker-compose exec indexer python -m sir amqp_setup
    admin/create-amqp-extension
    admin/setup-amqp-triggers install
  3. Build search indexes if they either have not been built or are outdated.

  4. Make indexer watch reindex messages with:

    admin/configure add live-indexing-search
    sudo docker-compose up -d
  5. Reenable replication cron job if you disabled it at 1.

    admin/configure add replication-cron
    sudo docker-compose up -d

Advanced configuration

Local changes

You should preferably not locally change any file being tracked by git. Check your working tree is clean with:

git status

Git is set to ignore the followings you are encouraged to write to:

  • .env file,
  • any new file under local directory.

Docker environment variables

There are many ways to set environment variables in Docker Compose, the most convenient here is probably to edit the hidden file .env.

You can then check values to be passed to containers using:

sudo docker-compose config

Finally, make Compose picks up configuration changes with:

sudo docker-compose up -d

Customize web server host:port

By default, the web server listens at http://localhost:5000

This can be changed using the two Docker environment variables MUSICBRAINZ_WEB_SERVER_HOST and MUSICBRAINZ_WEB_SERVER_PORT.

If MUSICBRAINZ_WEB_SERVER_PORT set to 80 (http), then the port number will not appear in the base URL of the web server.

If set to 443 (https), then the port number will not appear either, but the a separate reverse proxy is required to handle https correctly.

Customize the number of processes for MusicBrainz Server

By default, MusicBrainz Server uses 10 plackup processes at once.

This number can be changed using the Docker environment variable MUSICBRAINZ_SERVER_PROCESSES.

Customize download server

By default, data dumps and pre-built search indexes are downloaded from http://ftp.eu.metabrainz.org/pub/musicbrainz.

The download server can be changed using the Docker environment variable MUSICBRAINZ_BASE_DOWNLOAD_URL.

For backwards compatibility reasons an FTP server can be specified using the MUSICBRAINZ_BASE_FTP_URL Docker environment variable. Note that support for this variable is deprecated and will be removed in a future release.

See the list of download servers for alternative download sources.

Customize replication schedule

By default, there is no crontab file in musicbrainz service container.

If you followed the steps to schedule replication, then the crontab file used by musicbrainz service is bound to default/replication.cron.

This can be changed by creating a custom crontab file under local/ directory, and finally setting the Docker environment variable MUSICBRAINZ_CRONTAB_PATH to its path.

Customize search indexer configuration

By default, the configuration file used by indexer service is bound to default/indexer.ini.

This can be changed by creating a custom configuration file under local/ directory, and finally setting the Docker environment variable SIR_CONFIG_PATH to its path.

Docker Compose overrides

In Docker Compose, it is possible to override the base configuration using multiple Compose files.

Some overrides are available under compose directory. Feel free to write your own overrides under local directory.

The helper script admin/configure is able to:

  • list available compose files, with a descriptive summary
  • show the value of COMPOSE_FILE variable in Docker environment
  • set/update COMPOSE_FILE in .env file with a list of compose files
  • set/update COMPOSE_FILE in .env file with added or removed compose files

Try admin/configure help for more information.

Publish ports of all services

To publish ports of services db, mq, redis and search (additionally to musicbrainz) on the host, simply run:

admin/configure add publishing-all-ports
sudo docker-compose up -d

If you are running a database only mirror, run this instead:

admin/configure add publishing-db-port
sudo docker-compose up -d

Modify memory settings

By default, each of db and search services have about 2GB of RAM. You may want to set more or less memory for any of these services, depending on your available resources or on your priorities.

For example, to set 4GB to each of db and search services, create a file local/compose/memory-settings.yml as follows:

version: '3.1'

# Description: Customize memory settings

services:
  db:
    command: postgres -c "shared_buffers=4GB" -c "shared_preload_libraries=pg_amqp.so"
  search:
    environment:
      - SOLR_HEAP=4g

See postgres for more configuration parameters and options to pass to db service, and solr.in.sh for more environment variables to pass to search service,

Then enable it by running:

admin/configure add local/compose/memory-settings.yml
sudo docker-compose up -d

Test setup

If you just need a small server with sample data to test your own SQL queries and/or MusicBrainz Web Service calls, you can run the below commands instead of following the above installation:

git clone https://github.com/metabrainz/musicbrainz-docker.git
cd musicbrainz-docker
admin/configure add musicbrainz-standalone
sudo docker-compose build
sudo docker-compose run --rm musicbrainz createdb.sh -sample -fetch
sudo docker-compose up -d

The two differences are:

  1. Sample data dump is downloaded instead of full data dumps,
  2. MusicBrainz Server runs in standalone mode instead of mirror mode.

Build search indexes and Enable live indexing are the same.

Replication is not applicable to test setup.

Development setup

Required disk space is much lesser than normal setup: 15GB to be safe.

The below sections are optional depending on which service(s) you are coding.

Local development of MusicBrainz Server

For local development of MusicBrainz Server, you can run the below commands instead of following the above installation:

git clone https://github.com/metabrainz/musicbrainz-server.git
MUSICBRAINZ_SERVER_LOCAL_ROOT=$PWD/musicbrainz-server
git clone https://github.com/metabrainz/musicbrainz-docker.git
cd musicbrainz-docker
echo MUSICBRAINZ_DOCKER_HOST_IPADDRCOL=127.0.0.1: >> .env
echo MUSICBRAINZ_SERVER_LOCAL_ROOT="$MUSICBRAINZ_SERVER_LOCAL_ROOT" >> .env
admin/configure add musicbrainz-dev
sudo docker-compose build
sudo docker-compose run --rm musicbrainz createdb.sh -sample -fetch
sudo docker-compose up -d

The main differences are:

  1. Sample data dump is downloaded instead of full data dumps,
  2. MusicBrainz Server runs in standalone mode instead of mirror mode,
  3. Development mode is enabled (but Catalyst debug),
  4. JavaScript and resources are automaticaly recompiled on file changes,
  5. MusicBrainz Server is automatically restarted on Perl file changes,
  6. MusicBrainz Server code is in musicbrainz-server/ directory.
  7. Ports are published to the host only (through MUSICBRAINZ_DOCKER_HOST_IPADDRCOL)

After changing code in musicbrainz-server/, it can be run as follows:

sudo docker-compose restart musicbrainz

Build search indexes and Enable live indexing are the same.

Replication is not applicable to development setup.

Simply restart the container when checking out a new branch.

Local development of Search Index Rebuilder

This is very similar to the above but for Search Index Rebuilder (SIR):

  1. Set the variable SIR_LOCAL_ROOT in the .env file
  2. Run admin/configure add sir-dev
  3. Run sudo docker-compose up -d

Notes:

Local development of MusicBrainz Solr

The situation is quite different for this service as it doesn’t depends on any other. Its development rather rely on schema. See mb-solr and mmd-schema.

However, other services depend on it, so it is useful to run a local version of mb-solr in search service for integration tests:

  1. Run build.sh from your mb-solr local working copy to build a an image of metabrainz/mb-solr with a custom tag.
  2. Set MB_SOLR_VERSION in .env to this custom tag.
  3. Run sudo docker-compose up -d

Helper scripts

There are two directories with helper scripts:

  • admin/ contains helper scripts to be run from the host. For more information, use the --help option:

    admin/check-search-indexes --help
    admin/delete-search-indexes --help

    See also:

  • build/musicbrainz/scripts/ contains helper scripts to be run from the container attached to the service musicbrainz. Most of these scripts are not for direct use, but createdb.sh and below-documented recreatedb.sh.

Recreate database

If you need to recreate the database, you will need to enter the postgres password set in postgres.env:

  • sudo docker-compose run --rm musicbrainz recreatedb.sh

or to fetch new data dumps before recreating the database:

  • sudo docker-compose run --rm musicbrainz recreatedb.sh -fetch

Recreate database with indexed search

If you need to recreate the database with indexed search,

admin/configure rm replication-cron # if replication is enabled
sudo docker-compose stop
sudo docker-compose run --rm musicbrainz fetch-dump.sh both
admin/purge-message-queues
sudo docker-compose run --rm search load-search-indexes.sh --force
sudo docker-compose run --rm musicbrainz recreatedb.sh
sudo docker-compose up -d
admin/setup-amqp-triggers install
admin/configure add replication-cron
sudo docker-compose up -d

you will need to enter the postgres password set in postgres.env:

  • sudo docker-compose run --rm musicbrainz recreatedb.sh

or to fetch new data dumps before recreating the database:

  • sudo docker-compose run --rm musicbrainz recreatedb.sh -fetch

Update

Check your working tree is clean with:

git status

Check your currently checked out version:

git describe --dirty

Check releases for update instructions.

Issues

If anything doesn't work, check the troubleshooting page.

If you still don’t have a solution, please create an issue with versions info:

echo MusicBrainz Docker: `git describe --always --broken --dirty --tags` && \
echo Docker Compose: `docker-compose version --short` && \
sudo docker version -f 'Docker Client/Server: {{.Client.Version}}/{{.Server.Version}}'

More Repositories

1

picard

MusicBrainz Picard audio file tagger
Python
3,710
star
2

musicbrainz-server

Server for the MusicBrainz project (website, API, database tools)
Perl
816
star
3

listenbrainz-server

Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.
Python
666
star
4

bookbrainz-site

BookBrainz website, written in node.js.
JavaScript
189
star
5

picard-plugins

Picard plugins: use 1.0 branch for Picard < 2.0 (python 2/Qt4) and 2.0 branch for Picard >= 2.0 (python 3/Qt5)
Python
145
star
6

acousticbrainz-server

The server components for the AcousticBrainz project
Python
136
star
7

musicbrainz-android

The Official App of MusicBrainz
Kotlin
127
star
8

listenbrainz-android

Official Android App of ListenBrainz
Kotlin
92
star
9

libmusicbrainz

MusicBrainz Client Library
C++
68
star
10

critiquebrainz

Repository for Creative Commons licensed reviews
Python
66
star
11

mbspotify

MusicBrainz Spotify integration hack for SF Music Hack Day 2014
Python
64
star
12

troi-recommendation-playground

A recommendation engine playground that should hopefully make playing with music recommendations easy.
Python
51
star
13

libcoverart

C/C++ library for accessing the MusicBrainz Cover Art Archive
C++
43
star
14

libdiscid

C library for creating MusicBrainz DiscIDs from audio CDs
C
40
star
15

spambrainz_ml

Models for metabrainz/spambrainz
Jupyter Notebook
39
star
16

artwork-redirect

URL redirect service for the coverartarchive.org
Python
38
star
17

MusicBrainz-R2RML

R2RML mappings for the MusicBrainz schema
Shell
31
star
18

design-system

A Storybook project for UI development of React components for the MetaBrainz projects
JavaScript
29
star
19

docker-anon-ftp

Anonymous ftp server docker image based on vsftpd, used to serve MetaBrainz files
Shell
28
star
20

liblistenbrainz

A simple ListenBrainz client library for Python
Python
27
star
21

metabrainz.org

Website for the MetaBrainz Foundation
HTML
25
star
22

picard-website

Website for MusicBrainz Picard
Less
25
star
23

listenbrainz-labs

A collection tools/scripts to explore the ListenBrainz data using Apache Spark.
Python
16
star
24

picard-docs

Documentation for MusicBrainz Picard
Python
13
star
25

listenbrainz-ios

iOS App of ListenBrainz
Swift
11
star
26

metabrainz-logos

All of the logos, concept drawings and iterations of our logo redesign.
10
star
27

brainzutils-python

Python utilities for MetaBrainz projects
Python
10
star
28

messybrainz-server

The MessyBrainz project
CSS
10
star
29

sir

Transfer data from a MusicBrainz database to a Solr server
Python
9
star
30

search-server

Old MusicBrainz server server code based on Lucene 4
Java
9
star
31

geordi

MusicBrainz label feed ingestion tools
CSS
9
star
32

bookbrainz-data-js

A JavaScript data access module for BookBrainz
JavaScript
9
star
33

CAA-spec

The specification for how the Cover Art Archive works
8
star
34

mmd-schema

The MusicBrainz XML Metadata (MMD) Schema
Java
7
star
35

vagrant-images

Various ways to set up virtual machines managed by Vagrant/Chef
Ruby
7
star
36

musicbrainz-vm

Scripts for creating the MusicBrainz VM with Vagrant and Docker.
Shell
7
star
37

serviceregistrator

A Python-based bridge between docker containers and consul services, based on gliderlabs/registrator
Python
6
star
38

bookbrainz-user-guide

The user guide for BookBrainz, including general information, how-tos and style guidelines, hosted on readthedocs.org
5
star
39

docker-postgres

MetaBrainz postgres + pgbouncer container
Shell
5
star
40

musicbrainz-data

Data access layer for the NES version of the MusicBrainz database.
Haskell
5
star
41

CAA-indexer

A bot that watches MusicBrainz for changes and updates Cover Art Archive indexes
Perl
5
star
42

mb2wikidatabot

A bot for importing data from MusicBrainz into Wikidata
Python
5
star
43

data-set-hoster

Fill out a simple python object, host the results!
Python
4
star
44

postgresql-musicbrainz-collate

MusicBrainz collation routines for PostgreSQL
C
4
star
45

musicbrainz-ios

iOS App of MusicBrainz
Swift
4
star
46

musicbrainz-data-service

A JSON/HTTP server for exposing the musicbrainz-data library
Haskell
4
star
47

picard-snap

Code for snapping MusicBrainz Picard
Shell
4
star
48

listenbrainz-matching-tools

Useful tools for matching metadata to and from MusicBrainz.
Python
4
star
49

docker-helpers

Various scripts related to docker
Shell
4
star
50

postgresql-musicbrainz-unaccent

MusicBrainz unaccenting routines for PostgreSQL
C
4
star
51

docker-openresty

Openresty + luarocks + lua autossl
Dockerfile
4
star
52

mb-mail-service

Service for MusicBrainz to send emails
Rust
4
star
53

metabrainz-howto-guides

Documentation that captures common conventions and best practices of the foundation.
3
star
54

acousticbrainz-android

C++
3
star
55

mbsssss

MusicBrainz simple Solr search server schema
Python
3
star
56

bookbrainz-dev-docs

The developer documentation for the BookBrainz project, written in MarkDown and hosted on readthedocs.org
Python
3
star
57

genre-matching

Experiments to match external genre datasets to musicbrainz genres
Python
3
star
58

dbmirror

Database mirroring system for PostgreSQL
Perl
3
star
59

listenbrainz-content-resolver

Resolve ListenBrainz playlists from JSPF files to local playlists.
Python
3
star
60

docker-redis

MetaBrainz redis cluster
Shell
3
star
61

ansible-role-docker

Ansible role to install, configure and manager Docker on Ubuntu systems
Jinja
3
star
62

spambrainz

Spam detection for MusicBrainz
Jupyter Notebook
3
star
63

irombook-instrument-images

Stores free instrument images made by IROMBOOK and available from their websites
3
star
64

artist-artist-relations

A simple python script to calculate artist relations based on various artist albums in MusicBrainz.
Python
3
star
65

mb-solr

MusicBrainz Solr query response writer
Java
3
star
66

guidelines

Guidelines for contributing to MetaBrainz projects
2
star
67

messybrainz-labs

Scripts and other nonsense in an attempt to make the MessyBrainz data useful.
Python
2
star
68

logster

Logster parsers that MusicBrainz uses
Python
2
star
69

ansible-role-telegraf

Ansible role to install, configure and manage Telegraf on Ubuntu systems
Jinja
2
star
70

discourse-musicbrainz-auth

Ruby
2
star
71

miscellaneous

PHP
2
star
72

web-service-v3-design

A collaborative specification for the third version of the MusicBrainz web service
Haskell
2
star
73

ansible-role-shorewall

Ansible role to install, configure and manage Shorewall on Ubuntu systems
Jinja
2
star
74

changed-mbid-feed

A data feed of all the MBIDs that have changed in the last hour
Python
2
star
75

docker-python

Docker base image that comes with Python out of the box
Dockerfile
2
star
76

dbmirror2

Database mirroring system for the MusicBrainz Live Data Feed
PLpgSQL
2
star
77

bookbrainz-utils

The producer and consumer applications handling data imports for BookBrainz
TypeScript
2
star
78

gitzconsul

Clone a git repo containing json files, and keep a consul kv in sync with it (similar to git2consul)
Python
2
star
79

metric-writer

A simple script to periodically fetch all metrics collected in Redis to be sent to influx
Python
2
star
80

picard-plugin-tools

Tools to manage picard plugins
Python
2
star
81

mlhd-import

Scripts to parse and import the Music Listening History Dataset
Python
2
star
82

xmpp-messaging-server

Prototype of communications server and client for MetaBrainz projects
Python
2
star
83

musicbrainz-email

A daemon that sends emails, along with templates and scripts to enqueue emails
Haskell
1
star
84

sys-tools

random collection of system administration tools for use with MusicBrainz
Python
1
star
85

ansible-role-etc_hosts

Ansible role to manage /etc/hosts
Jinja
1
star
86

docker-exim

Exim docker image to relay MeB mails to GSuite, mainly
Shell
1
star
87

caa-admin

An administration companion for CAA-indexer
Haskell
1
star
88

mbs_logs_parsers

Generate tops from mbs nginx logs
Python
1
star
89

listenbrainz-now-playing

JavaScript
1
star
90

modbot

ModBot inspects the open edit queue and attempts to merge/reject edits appropriately
Haskell
1
star
91

chef-cookbooks

Various Chef cookbooks for provisioning MusicBrainz
Ruby
1
star
92

ansible-role-shorewall6

Ansible role to install, configure and manage Shorewall6 on Ubuntu systems
Jinja
1
star
93

search-indexer

Old search indexer code that creates indexes and pushes them to the old search servers based on Lucene 4.
Python
1
star
94

.github

Default community health files for @MetaBrainz GitHub repositories
1
star
95

musicbrainz-docs

MusicBrainz Documentation
Python
1
star
96

artwork-indexer

A daemon that watches MusicBrainz for changes and updates Cover Art Archive indexes
Python
1
star
97

jenkins

Jenkins container for testing MetaBrainz projects
Dockerfile
1
star
98

ansible-role-unbound

Ansible role to install and configure Unbound
Jinja
1
star
99

djcaa

A Cover Art Archive administration tool
Perl
1
star
100

bm

A quick python url benchmark script
Python
1
star