• Stars
    star
    424
  • Rank 101,663 (Top 3 %)
  • Language
    Python
  • License
    Other
  • Created over 13 years ago
  • Updated over 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Server components for Echoprint

Please note, this code is now deprecated

Please see the latest at Spotify's Github

Server components for Echoprint.

Echoprint is an open source music fingerprint and resolving framework powered by the The Echo Nest. The code generator (library to convert PCM samples from a microphone or file into Echoprint codes) is MIT licensed and free for any use. The server component that stores and resolves queries is Apache licensed and free for any use. The data for resolving to millions of songs is free for any use provided any changes or additions are merged back to the community.

Read more about Echoprint here.

What is included

The Echoprint server is a custom component for Apache Solr to index Echoprint codes and hash times. In order to keep the index fast, the Echoprint codes are stored in a Tokyo Tyrant key/value store. We also include the python API layer code necessary to match tracks based on the response from the custom component as well as a demo (non-production) API meant to illustrate how to setup and run the Echoprint service.

Non-included requirements for the server:

Additional non-included requirements for the demo:

  • web.py

What's inside

API/ - python libraries for querying and ingesting into the Echoprint server
API/api.py - web.py sample API wrapper for evaluation
API/fp.py - main python module for Echoprint
API/solr.py - Solr's python module (with slight enhancements)

examples/lookup.py - an example fingerprint and lookup of a query

Hashr/ - java project for a custom solr field type to handle Echoprint data

solr/ - complete solr install with Hashr already in the right place and with the right schema and config to make it work.

util/ - Utilities for importing and evaluating Echoprint
util/fastingest.py - import codes into the database
util/bigeval.py - evaluate the search accuracy of the database

How to run the server

  1. Start the server like this (change your directory to where you have echoprint-server/solr/solr)

     cd echoprint-server/solr/solr
     java -Dsolr.solr.home=/home/path/to/echoprint-server/solr/solr/solr/ -Djava.awt.headless=true -jar start.jar
    

    If you run this server somewhere else other than localhost, update the pointer to it in fp.py:

     _fp_solr = solr.SolrConnection("http://localhost:8502/solr/fp")
    
  2. Start the Tokyo Tyrant server.

     ttservctl start
    

    Again, if the location of the TT server differs, update fp.py:

     _tyrant_address = ['localhost', 1978]
    

Running in Python

fp.py has all the methods you'll need.

>>> import fp
>>> fp.ingest({"track_id": "my_track_id", "fp": "123 40 123 60 123 80 123 90 123 110 123 130", "length": "120", "codever": "4.12"})
>>> fp.commit()
>>> r = fp.best_match_for_query("123 40 124 60 125 80 126 90 127 110 128 130 129 60 123 40 127 50")
>>> r.message()
'query code length is too small'
>>> example_code = "eJwty7kNADAMw8BVNILl-Mv-iwWCU11D0g_CQA-USIwoXNEg5YBH3o3-0sil7AHIrAyw"
>>> r = fp.best_match_for_query(example_code)
>>> r.message()
'OK (match type 3)'
>>> r.TRID
'my_track_id'

Running the example API server

  1. Run the api.py webserver as a test

     cd API
     python api.py 8080
    
  2. Ingest codes with http://localhost:8080/ingest:

    POST the following variables:

     fp_code : packed code from codegen
     track_id : if you want your own track_ids. If you don't give one we'll generate one.
     length : the length of the track in seconds
     codever : the version of the codegen
     artist : the artist of the track (optional)
     release : the release of the track (optional)
     track : the track name (optional)
    

    For example:

     curl http://localhost:8080/ingest -d "fp_code=eJx1W...&track_id=thisone&length=300&codever=4.12"
    
  3. Query with http://localhost:8080/query?fp_code=XXX

    POST or GET the following:

     fp_code : packed code from codegen
    

Generating and importing data

  1. Download and compile the echoprint-codegen

  2. Generate a list of files to fingerprint

     find /music -name "*.mp3" > music_to_ingest
    
  3. Generate fingerprint codes for your files

     ./echoprint-codegen -s < music_to_ingest > allcodes.json
    
  4. Ingest the generated json.

     python fastingest.py [-b] allcodes.json
    

    The -b flag creates a file named bigeval.json that can be used to evaluate the accuracy of the fingerprint and server (see below)

The fastingest script is very memory intensive. For large dump files you may run out of memory while processing them. If this is the case, then you can split the dumps into smaller chunks using the splitdata.py script:

python splitdata.py ~/Downloads/echoprint-dump*.json

This will create 5 new dump files, input-1.json, input-2.json, etc. Import as above with fastingest

Using the community data

Publicly available fingerprint data is available under the Echoprint Database License. If you want to use this data you can download it from http://echoprint.me/data/

Use the fastingest.py tool to import this data like above:

python fastingest.py [-b] ~/Downloads/echoprint-dump*.json

You can run fastingest many times on one or more machines, as long as you update the configuration information for solr and tokyo tyrant in fp.py

Evaluating fingerprint accuracy

We provide an evaluation tool, bigeval, that can be used to test the accuracy of the fingerprint and server.

Run bigeval.py without any arguments to get a usage statement. This command will test 1000 random files.

python bigeval.py -c 1000

For every 10 files tested, bigeval will print out a line that looks like this.

PR 0.0875 CAR 0.9125 FAR 0.0000 FRR 0.0875 {'tn': 0, 'err-api': 0, 'fp-a': 1, 'tp': 73, 'err-codegen': 0, 'fp-b': 0, 'err-data': 0, 'total': 80, 'fn': 6, 'err-munge': 0}

This is what the fields mean:

PR           "probability of error"  a weighted measure of the overall goodness of the FP
CAR          "correct accept rate"   probability that you will correctly identify a known song
FAR          "false accept rate"     probability that you will say a song is there that is not
FRR          "false reject rate"     probability that you will say a song is not there that is
err-api      API error               # of times the API had a timeout or error
err-data     data problem            # of times our datastore had an issue (missing data is the biggest culprit)
err-codegen  codegen fail            # of times codegen did not return properly with data
err-munge    munger err              # of times the munging process (downsampling, filtering, re-encoding etc) did not generate a playable file
fp-a         false pos A             we had a false positive where the wrong song was identified
fp-b         false pos B             we said a song was there that was not actually there
tp           true pos                correct song chosen
tn           true neg                song correctly identified as not there
fn           false neg               song there but we said it wasn't

If an error occurs during the matching, a message describing the error will be printed. Use the -p flag to print extra information about the scores obtained from solr when an error occurs to see how the server is choosing its winner. Use -1 file to test a single file and print its score information

A number of munge parameters are available to bigeval. These parameters alter the input file before generating a fingerprint, to simulate noisy signals. Run bigeval.py --help to see the available options. These options require mpg123 and ffmpeg to be installed.

You can test for true negatives by creating a list of tracks that you know are not in the database:

find /new_music -type f > new_music

Name the file new_music and put it in the same directory as bigeval.py.

Notes

  • You can run Echoprint in "local" mode which uses a python dict to store and index codes instead of Solr. You can store and index about 100K tracks in 1GB or so in practice using this mode. This is only useful for small scale testing. Each fp.py method takes an optional "local" kwarg.

More Repositories

1

pyechonest

Python client for the Echo Nest API
Python
668
star
2

remix

Echo Nest Remix: The Internet Synthesizer
HTML
414
star
3

remix.js

Echo Nest Remix: The (JavaScript) Internet Synthesizer
JavaScript
376
star
4

echoprint-ios-sample

Sample Echoprint song identifier for iOS
Objective-C
165
star
5

libechonest

Objective-C Library for The Echo Nest API
Objective-C
75
star
6

enios

Updated iOS wrapper for EN API
Objective-C
71
star
7

jEN

the java client for The Echo Nest API
Java
67
star
8

msd-examples

Example code for processing the Million Song Dataset and other big music datasets
Python
61
star
9

synchdata

Synchronize analysis data to a corresponding waveform using synchstring
C++
50
star
10

en_analyzer

Max/MSP interface to The Echo Nest's audio analysis API
C
48
star
11

nestjs

nestjs
JavaScript
44
star
12

GirlTalkInABox

Source for Girl Talk In A Box
JavaScript
40
star
13

python-tutorials

Tutorials on how to use Pyechonest with our API
Python
33
star
14

remix-examples

All the remix examples, all the time.
Python
25
star
15

bonhamizer

Adds the hammer of the gods to any sng
JavaScript
22
star
16

libechonest-samples

Sample iOS projects for libechonest
Objective-C
20
star
17

ArtistX

Explore the song attributes for an artist
JavaScript
16
star
18

echotron

A demo app that shows the Echo Nest data for any one of millions of artists
CSS
15
star
19

nestling

Awesome Ruby wrapper for the Echo Nest developer APIs
Ruby
14
star
20

egonest

Go client library for The Echo Nest API
Go
12
star
21

remix-server

Backend server for handling remix.js uploads
Python
10
star
22

TwitSpace

Bring a bit of that ol' Myspace magic to any tweet that mentions a band/artist name. SERIOUS HAX INSIDE
JavaScript
9
star
23

profileX

Explore the song attributes of a Taste Profile
JavaScript
9
star
24

iPodPics

Demo iOS app that shows artist images while you listen to music
Objective-C
8
star
25

echo_nest_sandbox_example

Demonstrates how to use the new Sandbox API
Python
7
star
26

DeezerPlaylistDemo

Demonstration web apps that shows hows to combine Deezer and The Echo Nest APIs in a web app
PHP
6
star
27

en-api-pres

Slides for the Echo Nest API presentation used at hackathons and hack days
CSS
5
star
28

kiosk

A musical information Kiosk from 1984 to run on your Apple //c
Python
5
star
29

musicroom

JavaScript
4
star
30

traffic-control

traffic light control circuit
3
star
31

snuGIFy

Snuggle GIFs in the music sparkle
Python
3
star
32

polarized

Java
1
star