• This repository has been archived on 27/Jun/2024
  • Stars
    star
    168
  • Rank 225,507 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 5 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python library to extract data from rateyourmusic.com.

rymscraper

Build Status Codacy Badge

rymscraper is an unofficial Python API to extract data from rateyourmusic.com (πŸ‘ consider supporting them!).

⚠️ An excessive usage of rymscraper can make your IP address banned by rateyourmusic for a few days.

Requirements

  • beautifulsoup4
  • lxml
  • requests
  • pandas
  • selenium with geckodriver
  • tqdm

Installation

Classic installation

python setup.py install

Installation in a virtualenv with pipenv

pipenv install '-e .'

Example

The data format used by the library is the python dict. It can be easily converted to CSV or JSON.

>>> import pandas as pd
>>> from rymscraper import rymscraper, RymUrl

>>> network = rymscraper.RymNetwork()

Artist

>>> artist_infos = network.get_artist_infos(name="Daft Punk")
>>> # or network.get_artist_infos(url="https://rateyourmusic.com/artist/daft-punk")
>>> import json
>>> json.dumps(artist_infos, indent=2, ensure_ascii=False)
{
    "Name": "Daft Punk",
    "Formed": "1993, Paris, Île-de-France, France",
    "Disbanded": "22 February 2021",
    "Members": "Thomas Bangalter (programming, synthesizer, keyboards, drum machine, guitar, bass, vocals, vocoder, talk box), Guy-Manuel de Homem-Christo (programming, synthesizer, keyboards, drums, drum machine, guitar)",
    "Related Artists": "Darlin'",
    "Notes": "See also: Discovered: A Collection of Daft Funk Samples",
    "Also Known As": "Draft Ponk",
    "Genres": "French House, Film Score, Disco, Electronic, Synthpop, Electroclash"
}
>>> # you can easily convert all returned values to a pandas dataframe
>>> df = pd.DataFrame([artist_infos])
>>> df[['Name', 'Formed', 'Disbanded']]
     Name                              Formed         Disbanded
Daft Punk  1993, Paris, Île-de-France, France  22 February 2021

You can also extract several artists at once:

# several artists
>>> list_artists_infos = network.get_artists_infos(names=["Air", "M83"])
>>> # or network.get_artists_infos(urls=["https://rateyourmusic.com/artist/air", "https://rateyourmusic.com/artist/m83"])
>>> df = pd.DataFrame(list_artists_infos)

Album

>>> # name field should use the format Artist - Album name (not ideal but it works for now)
>>> album_infos = network.get_album_infos(name="XTC - Black Sea")
>>> # or network.get_album_infos(url="https://rateyourmusic.com/release/album/xtc/black-sea/")
>>> df = pd.DataFrame([album_infos])

You can also extract several albums at once:

# several albums
>>> list_album_infos = network.get_albums_infos(names=["Ride - Nowhere", "Electrelane - Axes"])
>>> # or network.get_albums_infos(urls=["https://rateyourmusic.com/release/album/ride/nowhere/", "https://rateyourmusic.com/release/album/electrelane/axes/"])
>>> df = pd.DataFrame(list_album_infos)

Album Timeline

Number of ratings per day:

>>> album_timeline = network.get_album_timeline(url="https://rateyourmusic.com/release/album/feu-chatterton/palais-dargile/")
>>> df = pd.DataFrame(album_timeline)
>>> df["Date"] = df["Date"].apply(lambda x: datetime.datetime.strptime(x, "%d %b %Y"))
>>> df["Date"].groupby(df["Date"].dt.to_period("D")).count().plot(kind="bar")

timeline_plot

Chart

>>> # (slow for very long charts)
>>> rym_url = RymUrl.RymUrl() # default: top of all-time. See examples/get_chart.py source code for more options.
>>> chart_infos = network.get_chart_infos(url=rym_url, max_page=3)
>>> df = pd.DataFrame(chart_infos)
>>> df[['Rank', 'Artist', 'Album', 'RYM Rating', 'Ratings']]
Rank                         Artist                                              Album RYM Rating Ratings
   1                      Radiohead                                        OK Computer       4.23   67360
   2                     Pink Floyd                                 Wish You Were Here       4.29   46534
   3                   King Crimson                   In the Court of the Crimson King       4.30   42784
   4                      Radiohead                                              Kid A       4.21   55999
   5            My Bloody Valentine                                           Loveless       4.24   47394
   6                 Kendrick Lamar                                To Pimp a Butterfly       4.27   41040
   7                     Pink Floyd                          The Dark Side of the Moon       4.20   55535
   8                    The Beatles                                         Abbey Road       4.25   42739
   9  The Velvet Underground & Nico                      The Velvet Underground & Nico       4.24   44002
  10                    David Bowie  The Rise and Fall of Ziggy Stardust and the Sp...       4.26   37963

Discography

>>> discography_infos = network.get_discography_infos(name="Aufgang", complementary_infos=True)
>>> # or network.get_discography_infos(url="https://rateyourmusic.com/artist/aufgang")
>>> df = pd.DataFrame.from_records(discography_infos)
>>> # don't forget to close and quit the browser (prevent memory leaks)
>>> network.browser.close()
>>> network.browser.quit()

Example Scripts

Some scripts are included in the examples folder.

  • get_artist_infos.py : extract informations about one or several artists by name or url in a csv file.
  • get_chart.py : extract albums information appearing in a chart by name, year or url in a csv file.
  • get_discography.py : extract the discography of one or several artists by name or url in a csv file.
  • get_album_infos.py : extract informations about one or several albums by name or url in a csv file.
  • get_album_timeline.py : extract the timeline of an album into a json file.

Usage

python get_artist_infos.py -a "u2,xtc,brad mehldau"
python get_artist_infos.py --file_artist artist_list.txt

python get_chart.py -g rock
python get_chart.py -g ambient -y 2010s -c France --everything

python get_discography.py -a magma
python get_discography.py -a "the new pornographers, ween, stereolab" --complementary_infos --separate_export

python get_album_infos.py -a "ride - nowhere"
python get_album_infos.py --file_url urls_list.txt --no_headless

python get_album_timeline.py -a "ride - nowhere"
python get_album_timeline.py -u "https://rateyourmusic.com/release/album/feu-chatterton/palais-dargile/"

Help

python get_artist_infos.py -h
usage: get_artist_infos.py [-h] [--debug] [-u URL] [--file_url FILE_URL]
                           [--file_artist FILE_ARTIST] [-a ARTIST] [-s]
                           [--no_headless]

Scraper rateyourmusic (artist version).

optional arguments:
  -h, --help            show this help message and exit
  --debug               Display debugging information.
  -u URL, --url URL     URLs of the artists to extract (separated by comma).
  --file_url FILE_URL   File containing the URLs to extract (one by line).
  --file_artist FILE_ARTIST
                        File containing the artists to extract (one by line).
  -a ARTIST, --artist ARTIST
                        Artists to extract (separated by comma).
  -s, --separate_export
                        Also export the artists in separate files.
  --no_headless         Launch selenium in foreground (background by default).
python get_chart.py -h
usage: get_chart.py [-h] [--debug] [-u URL] [-g GENRE] [-y YEAR] [-c COUNTRY]
                    [-p PAGE] [-e] [--no_headless]

Scraper rateyourmusic (chart version).

optional arguments:
  -h, --help            show this help message and exit
  --debug               Display debugging information.
  -u URL, --url URL     Chart URL to parse.
  -g GENRE, --genre GENRE
                        Chart Option : Genre (use + if you need a space).
  -y YEAR, --year YEAR  Chart Option : Year.
  -c COUNTRY, --country COUNTRY
                        Chart Option : Country.
  -p PAGE, --page PAGE  Number of page to extract. If not set, every pages
                        will be extracted.
  -e, --everything      Chart Option : Extract Everything / All Releases
                        (otherwise only albums).
  --no_headless         Launch selenium in foreground (background by default).
python get_discography.py -h
usage: get_discography.py [-h] [--debug] [-u URL] [--file_url FILE_URL]
                          [--file_artist FILE_ARTIST] [-a ARTIST] [-s] [-c]
                          [--no_headless]

Scraper rateyourmusic (discography version).

optional arguments:
  -h, --help            show this help message and exit
  --debug               Display debugging information.
  -u URL, --url URL     URLs to extract (separated by comma).
  --file_url FILE_URL   File containing the URLs to extract (one by line).
  --file_artist FILE_ARTIST
                        File containing the artists to extract (one by line).
  -a ARTIST, --artist ARTIST
                        Artists to extract (separated by comma).
  -s, --separate_export
                        Also export the artists in separate files.
  -c, --complementary_infos
                        Extract complementary informations for each releases
                        (slower, more requests on rym).
  --no_headless         Launch selenium in foreground (background by default).
python get_album_infos.py -h
usage: get_album_infos.py [-h] [--debug] [-u URL] [--file_url FILE_URL]
                          [--file_album_name FILE_ALBUM_NAME] [-a ALBUM_NAME]
                          [-s] [--no_headless]

Scraper rateyourmusic (album version).

optional arguments:
  -h, --help            show this help message and exit
  --debug               Display debugging information.
  -u URL, --url URL     URL to extract (separated by comma).
  --file_url FILE_URL   File containing the URLs to extract (one by line).
  --file_album_name FILE_ALBUM_NAME
                        File containing the name of the albums to extract (one
                        by line, format Artist - Album).
  -a ALBUM_NAME, --album_name ALBUM_NAME
                        Albums to extract (separated by comma, format Artist -
                        Album).
  -s, --separate_export
                        Also export the artists in separate files.
  --no_headless         Launch selenium in foreground (background by default).
python get_album_timeline.py -h
usage: get_album_timeline.py [-h] [--debug] [-u URL] [-a ALBUM_NAME]
                             [--no_headless]

Scraper rateyourmusic (album timeline version).

optional arguments:
  -h, --help            show this help message and exit
  --debug               Display debugging information.
  -u URL, --url URL     URL to extract.
  -a ALBUM_NAME, --album_name ALBUM_NAME
                        Album to extract (format Artist - Album).
  --no_headless         Launch selenium in foreground (background by default).

More Repositories

1

awesome-lemmy

A community driven list of useful apps, tools and websites for the Lemmy federated social network.
Python
271
star
2

docker-compose

A collection of docker-compose files.
99
star
3

gsmarena-scraper

Scraper for gsmarena.com (deprecated)
Python
25
star
4

youtube_extract

Extract metadata for all videos of a youtube channel.
Python
23
star
5

ypc

Convert text/spotify/deezer albums/playlists to youtube urls and audio/video files.
Python
20
star
6

lastfm-scraper

Scripts to extract data from lastfm.
Python
18
star
7

reddit_export_userdata

Export userdata from your reddit accounts. Submissions, comments, saved, upvoted contents are supported.
Python
16
star
8

senscritiquescraper

Python library to extract data from senscritique.com.
Python
14
star
9

archiveboxmatic

ArchiveBoxMatic: configure ArchiveBox with the simplicity of a yaml file.
Python
13
star
10

reddit-scraper

Various scripts to download posts/submissions/comments of a reddit subreddit/post/user.
Python
11
star
11

mpdscrobble

mpdscrobble: a simple Last.fm scrobbler for MPD. Also compatible with ListenBrainz and Maloja.
Python
10
star
12

fdroid-insights

A website to simplify searching for popular and well-maintained F-Droid apps.
Python
9
star
13

lastfm_cg

Lastfm collage generator (+ twitter/mastodon bot to post the generated collages).
Python
9
star
14

nixos-config

My NixOS config files.
Nix
8
star
15

dotfiles

My linux config files.
Python
7
star
16

youtube-archiver

Scripts to archive youtube channels and playlists using yt-dlp.
Shell
7
star
17

scrap_revuedepresse

Download newspaper covers from a variety of sources.
Python
5
star
18

my-steam-library

A simple website to show a Steam Library.
Python
4
star
19

reddit_bot_revuedepresse

Reddit bot behind the /u/revuedepresse account.
Python
3
star
20

game_deals

Python script to create a formatted reddit table from a list of steam games with data from Steam, IsThereAnyDeal, Opencritic, HowLongToBeat.
Python
3
star
21

export-all-tweets

Export the maximum of tweets allowed by the Twitter API for one or several twitter users.
Python
2
star
22

playlist-versioning

My playlists under version control.
Python
2
star
23

github-scraper

Scripts to extract data from github
Python
2
star
24

lastfm-to-librefm

Python script to transfer lastfm scrobbles to a librefm account.
Python
2
star
25

awesome-backup

A curated list of backup tools.
2
star
26

bandcamp-library-scraper

Export your bandcamp library into a csv file.
Python
2
star
27

tuberank

A community website to rate and discover youtube videos.
HTML
2
star
28

export-all-mastodon-toots

Export all toots from one or several mastodon accounts.
Python
2
star
29

awesome-starred

A curated list of my GitHub stars!
2
star
30

shaarli-import-datastore

Convert a shaarli datastore.php database to an html file.
Python
2
star
31

django-reddit

A simple django website with some handy reddit tools.
Python
2
star
32

subreddit_tracker

Python tool to periodically export usage statistics of a subreddit list in a database.
Python
2
star
33

autoscreen

Automatically take screenshots at a random time every hour (xorg and wayland compatible).
Shell
1
star
34

pcpartpicker-scraper

Scraper for pcpartpicker.com
Python
1
star
35

dotfiles-pinephone

My linux config files for my pinephone.
Python
1
star
36

scripts

bash scripts
Shell
1
star
37

st

My st fork
C
1
star
38

reddit_bestof

Create and send Reddit BestOf reports.
Python
1
star
39

covid19-webapp

R-Shiny webapp
R
1
star
40

dbeley

Python
1
star
41

pythreader

Python utility to read Twitter threads.
Python
1
star
42

text_generator

Generate text with the help of textgenrnn and gpt-2-simple.
Python
1
star
43

cars-data-scraper

Scraper for cars-data.com
Python
1
star
44

lpa-table

Linux Phone Apps Table. Alternative front-end for linuxphoneapps.org.
Python
1
star
45

django-lastfm

A simple django website with some handy lastfm tools.
Python
1
star
46

steam_screenshots_downloader

Download every public screenshots of a steam user.
Python
1
star