• Stars
    star
    157
  • Rank 238,399 (Top 5 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Python package for training and analyzing expected goals (xG) models in soccer.

Soccer xG

A Python package for training and analyzing expected goals (xG) models in soccer.




About

This repository contains the code and models for our series on the analysis of xG models:

In particular, it contains code for experimenting with an exhaustive set of features and machine learning pipelines for predicting xG values from soccer event stream data. Since we rely on the SPADL language as input format, soccer_xg currently supports event streams provided by Opta, Wyscout, and StatsBomb.

Getting started

The recommended way to install soccer_xg is to simply use pip:

$ pip install soccer-xg

Subsequently, a basic xG model can be trained and applied with the code below:

from itertools import product
from soccer_xg import XGModel, DataApi

# load the data
provider = 'wyscout_opensource'
leagues = ['ENG', 'ESP', 'ITA', 'GER', 'FRA']
seasons = ['1718']
api = DataApi([f"data/{provider}/spadl-{provider}-{l}-{s}.h5" 
        for (l,s) in product(leagues, seasons)])
# load the default pipeline
model = XGModel()
# train the model
model.train(api, training_seasons=[('ESP', '1718'), ('ITA', '1718'), ('GER', '1718')])
# validate the model
model.validate(api, validation_seasons=[('ENG', '1718')])
# predict xG values
model.estimate(api, game_ids=[2500098])

Although this default pipeline is suitable for computing xG, it is by no means the best possible model. The notebook 4-creating-custom-xg-pipelines illustrates how you can train your own xG models or you can use one of the four pipelines used in our blogpost series. These can be loaded with:

XGModel.load_model('openplay_logreg_basic')
XGModel.load_model('openplay_xgboost_basic')
XGModel.load_model('openplay_logreg_advanced')
XGModel.load_model('openplay_xgboost_advanced')

Note that these models are meant to predict shots from open play. To be able to compute xG values from all shot types, you will have to combine them with a pipeline for penalties and free kicks.

from soccer_xg import xg

openplay_model = xg.XGModel.load_model(f'openplay_xgboost_advanced') # custom pipeline for open play shots
penalty_model = xg.PenaltyXGModel() # default pipeline for penalties
freekick_model = xg.FreekickXGModel() # default pipeline for free kicks

model = xg.XGModel()
model.model = [openplay_model, penalty_model, freekick_model]
model.train(api, training_seasons=...)

For developers

Create venv and install deps

make init

Install git precommit hook

make precommit_install

Run linters, autoformat, tests etc.

make pretty lint test

Bump new version

make bump_major
make bump_minor
make bump_patch

Research

If you make use of this package in your research, please use the following citation:

@inproceedings{robberechts2020data,
  title={How data availability affects the ability to learn good xG models},
  author={Robberechts, Pieter and Davis, Jesse},
  booktitle={International Workshop on Machine Learning and Data Mining for Sports Analytics},
  pages={17--27},
  year={2020},
  organization={Springer}
}

License

Copyright (c) DTAI - KU Leuven โ€“ All rights reserved.
Licensed under the Apache License, Version 2.0
Written by Pieter Robberechts, 2020

More Repositories

1

socceraction

Convert soccer event stream data to SPADL and value player actions using VAEP or xT
Python
594
star
2

problog

ProbLog is a Probabilistic Logic Programming Language for logic programs with probabilities.
Python
306
star
3

deepproblog

DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predicate.
Python
182
star
4

cobras

Interactive clustering with super-instances
Python
49
star
5

soccermix

SoccerMix is a soft clustering technique based on mixture models that decomposes event stream data into a number of prototypical actions of a specific type, location, and direction.
Jupyter Notebook
37
star
6

SAR-PU

Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data
Jupyter Notebook
32
star
7

ETSY

Synchronize soccer event and tracking data
Python
28
star
8

un-xPass

Measuring soccer player's creativity
Jupyter Notebook
28
star
9

deepstochlog

Python
24
star
10

locomotif

LoCoMotif is a time series motif discovery method that discovers variable-length motif sets in multivariate time series using time warping
Jupyter Notebook
14
star
11

LearnSDD

Implementation of the Structure Learning Algorithm for Sentential Decision Diagrams
C
8
star
12

KBC-as-PU-Learning

Source code & appendices accompanying the AAAI2022 paper "Unifying Knowledge Base Completion with PU Learning to Mitigate the Observation Bias"
Jupyter Notebook
8
star
13

ml-project-2023-2024

Template and instructions to submit a solution for the assignment for the KU Leuven course ML Project
Python
8
star
14

incal

Learn SMT(LRA) constraints from data
Python
6
star
15

the_apples_game

Multi-Agent Learning assignment, Machine Learning Project @ KU Leuven
JavaScript
6
star
16

PyDC

Python wrapper for Distributional Clauses
C++
4
star
17

catlog

Gradient estimation of discrete probabilistic models with the CatLog-Derivative trick
Jupyter Notebook
4
star
18

ocus-explain

Efficient Explaining CSPs with Unsatisfiable Subset Optimization
Jupyter Notebook
3
star
19

PySDD

Python
3
star
20

tacle

Constraint learning for spreadsheets
Python
3
star
21

JSDD

Java wrapper for the sdd 2.0 c library
Java
3
star
22

HASSLE-GEN

This repository contains the code for our paper: Senne Berden, Mohit Kumar, Samuel Kolb, and Tias Guns (2022): Learning MAX-SAT Models from Examples using Genetic Algorithms and Knowledge Compilation, CP 2022
Python
3
star
23

deepseaproblog

The official implementation of DeepSeaProbLog, a neural probabilistic logic programming language supporting discrete and continuous random variables.
Python
3
star
24

comparative-evaluation-of-anomaly-detection-methods

Accompanying repository for the ODD workshop paper: "The Effect of Hyperparameter Tuning on the Comparative Evaluation of Anomaly Detection Methods"
Jupyter Notebook
3
star
25

hanoi

Python
2
star
26

RoViLa

Robot platform that uses vision and language as input to reason about the world.
Python
2
star
27

COUNT-CP

COUNT-CP is a constraint learner that uses a generate-and-aggregate approach to learn CP models
Python
2
star
28

ai-course-demos

Links to & implementations of demos for the course "Artificiรซle Intelligentie" at KU Leuven
CSS
2
star
29

betaproblog

Prolog
1
star
30

ml-project-2021-2022

Template and instructions to submit a solution for the assignment.
Python
1
star
31

py-dreaml

Python package of DreaML for relational mixed discrete-continuous learning and inference
Python
1
star
32

psyche

1
star
33

psipy

Python wrapper for the PSI-Solver (probabilistic symbolic inference)
D
1
star
34

GenLex

Implentation of a Semantic Parser that maps natural/instructional language to a logical expression.
Java
1
star
35

amie

Automatic Monitoring of Indoor Exercises
Jupyter Notebook
1
star
36

xmskill

Jupyter Notebook
1
star