• Stars
    star
    121
  • Rank 293,924 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    GNU General Publi...
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Luca Pappalardos code for working with and plotting Wyscout data

Exploring spatio-temporal soccer events using public event data

Tutorial supported by EU project SoBigData++ RI (Grant Agreement 871042).

A video version of the tutorial is available on YouTube at:

The code has been developed by:

and explores the events in an open collection of soccer-logs described in the following paper (please cite it if you use the public data of the code in this folder):

  • (PCR2019) Pappalardo, L., Cintia, P., Rossi, A. et al. A public data set of spatio-temporal match events in soccer competitions. Nature Scientific Data 6, 236 (2019). https://doi.org/10.1038/s41597-019-0247-7

Data collection

The soccer-logs have been collected and provided by Wyscout. The procedure of data collection is performed by expert video analysts (the operators), who are trained and focused on data collection for soccer, through a proprietary software (the tagger). The tagger has been developed and improved over several years and it is constantly updated to always guarantee better and better performance at the highest standards.

Based on the tagger and the videos of soccer games, to guarantee the accuracy of data collection, the tagging of events in a match is performed by three operators, one operator per team and one operator acting as responsible supervisor of the output of the whole match. Optionally for near-live data delivery a team of four operators is used, one of them acting to speed up the collection of complex events which need additional and specific attributes or a quick review. Further details on data collection can be found in the data paper (PCR2019).

Data Records

The data sets are released under the CC BY 4.0 License and are publicly available on figshare:

The data refer to season 2017/2018 of five national soccer competitions in Europe: Spanish first division, Italian first division, English first division, German first division, French first division. In addition, there are data about the World cup 2018 and the European cup 2016, which are competitions for national teams. In total, we provide seven data sets corresponding to information about all competitions, matches, teams, players, events, referees and coaches.

Each data set is provided in JSON format (JavaScript Object Notation). The following table shows the list of competitions we make available with their total number of matches, events and players. The data covers a total of around 1,941 matches, 3,251,294 events and 4,299 players.

Competition #matches #events #players
Spanish first division 380 628,659 619
English first division 380 643,150 603
Italian first division 380 647,372 686
German first division 306 519,407 537
French first division 380 632,807 629
World cup 2018 64 101,759 736
European cup 2016 51 78,140 552
1,941 3,251,294 4,299

Outline of the tutorial

  • Import libraries
  • Load public datasets
  • How are the data collected?
  • Structure of data
    • Players
    • Competitions
    • Matches
    • Events
  • Basic statistics on events
    • Frequency of events by type
    • Distribution of number of events per match
    • Plot events on the field
      • Static plot
      • Interactive plot
  • Spatial distribution of events
  • Intra-match evolution
  • Advanced statistics
    • Passing networks
    • Flow centrality
    • PlayeRank algorithm