• Stars
    star
    282
  • Rank 145,582 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

ProphitBet is a Machine Learning Soccer Bet prediction application. It analyzes the form of teams, computes match statistics and predicts the outcomes of a match using Advanced Machine Learning (ML) methods. The supported algorithms in this application are Neural Networks, Random Forests & Ensembl Models.

ProphitBet - Soccer Bets Predictor

ProphitBet is a Machine Learning Soccer Bet prediction application. The name is a combination of "Profit" & "Prophet". It analyzes the form of teams with stunning visualizations, computes statistics from previous matches of a selected league and predicts the outcomes of a match using Advanced Machine Learning (ML) methods. The supported algorithms in this application are Neural Networks, Random Forests & Ensemble models. Additionally, the users may analyze the features of the models and adjust the models accordingly. The model extracts soccer data for multiple leagues from football-data(https://www.football-data.co.uk/). Additionally, the application can parse upcoming fixtures from Footystats(https://footystats.org/) and predict the upcoming matches for a league. There is also an auto-save feature, which saves the training of the models, so that users can re-load them on the next run. Finally, the application requires Internet Connection, in order to download the league data.

Stunning Graphical Interface

The user interface is pretty simple: Every action can be done via a menu-bar on the top of the application. There are 5 available menus:

  • Application: Create/Load/Delete Leagues
  • Analysis: Data Analysis & Feature Importance
  • Model: Train/Evaluate Models & Predict Matches
  • Theme: Select a Theme for the Application Window
  • Help: Additional Resources to Read about Machine Learning Topics

Also, 4 custom themes have been added and can be selected via "Theme" menu. The themes are:

  1. Breeze-Light
  2. Breeze-Dark
  3. Forest-Light
  4. Forest-Dark

gui

gui

League Statistics

For each league, the application computes several statistics (features) about the teams, including their form, the performance of the last N matches, etc. The stats are computed for both the home team and the away team. More specifically:

  1. Home Wins (HW): Last N wins of the home team in its home
  2. Home Losses (HL): Last N losses of the home team in its home
  3. Home Goal Forward (HGF): Sum of goals that the home team scored in the last N matches in its home
  4. Home Goal Against (HGA): Sum of goals that the away teams scored in the last N matches.
  5. Home G-Goal Difference Wins (HGD-W) Last N wins of the home team with G difference in the final score in its home (${HG - AG \geq 2}$)
  6. Home G-Goal Difference Losses (HGD-L) Last N losses of the home team with G difference in the final score in its home (${HG - AG \geq 2}$)
  7. Home Win Rate (HW%) Total win rate of the home team from the start of the league in its home
  8. Home Loss Rate (HL%) Total loss rate of the home team from the start of the league in its home
  9. Away Wins (AW): Last N wins of the away team away its home
  10. Away Losses (AL): Last N losses of the away team away its home
  11. Away Goal Forward (AGF): Sum of goals that the away team scored in the last N matches away its home
  12. Away Goal Against (AGA): Sum of goals that the home teams scored in the last N matches.
  13. Away G-Goal Difference Wins (AGD-W) Last N wins of the away team with G difference in the final score away its home(${HG - AG \geq 2}$)
  14. Away G-Goal Difference Losses (AGD-L) Last N losses of the away team with G difference in the final score away its home (${HG - AG \geq 2}$)
  15. Away Win Rate (AW%) Total win rate from the start of the league away its home
  16. Away Loss Rate (AL%) Total loss rate from the start of the league away its home

Each column can be added or removed from a league during the creating phase.

Leagues

ProphitBet provides 11 main soccer leagues and 2 extras, which are downloaded by https://www.football-data.co.uk/. More specifically, these leagues are:

  • Premier League (England)
  • Premiership (Scotland)
  • Bundesliga I (Germany)
  • Serie A (Italy)
  • La Liga (Spain)
  • Ligue I (Franch)
  • Eredivisie (Netherlands)
  • Jupiler Pro League (Belgium)
  • Liga I (Portugal)
  • Super Lig (Turkey)
  • Super League (Greece)
  • Serie A (Brazil)
  • Allsvenskan (Sweden)

You can add additional leagues by modifying the database/leagues.csv configuration file. In order to add a new league, you need to specify:

  1. Country (The country of the league, e.g. Russia)
  2. League Name (The name of the league e.g. Premier League)
  3. Base Url (The link to the .csv file from football-data, e.g. https://www.football-data.co.uk/new/RUS.csv)
  4. Year Start (The year that ProphitBet will stop collecting data, e.g. 2015)
  5. League Type (Since it's an extra league, it always has to be "extra")
  6. Fixtures Url (The fixture's url from footystats, which will be used to parse upcoming matches, e.g. https://footystats.org/russia/russian-premier-league)

Feature Correlation Analysis

This is particulary useful, when analyzing the quality of the training data). ProphitBet provides a headmap for the computed stats, which shows the correlations between the columns. The correlation is described by an arithmetic value ${r \in[-1.0, 1.0]}$. The closer $r$ is to zero, the weaker the correlation is between 2 columns. The closer to 1.0 or -1.0, the stronger the correlation will be. Ideally, a feature is good if the correlation with other features is close to zero ($r=0$).

correlation heatmap analysis

Feature Importance Analysis

ProphitBet also comes with a built-in module for "interpretability". In case you are wondering which stats are the most important, there are 4 methods provided:

  1. Ensemble Learning (https://www.knowledgehut.com/blog/data-science/bagging-and-random-forest-in-machine-learning)
  2. Variance Analysis (https://corporatefinanceinstitute.com/resources/knowledge/accounting/variance-analysis/)
  3. Univariate Analysis (https://link.springer.com/referenceworkentry/10.1007/978-94-007-0753-5_3110)
  4. Recursive Feature Elimination (https://bookdown.org/max/FES/recursive-feature-elimination.html)

feature-importance-analysis

Class Distribution Analysis

It is noticed that the training dataset of several leagues contains imbalanced classes, which means that the number of matches that ended in a win for the home team is a lot larger than the number of the matches that ended in a win for the away team. This often leads models to overestimate their prediction probabilities and tend to have a bias towards the home team. ProphitBet provides a plot to detect such leagues, using the Target Distrubution Plot, as well as several tools to deal with that, including:

  1. Noise Injection (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2771718/)
  2. Output Probability Calibration (https://davidrosenberg.github.io/ttml2021/calibration/2.calibration.pdf)

class distribution

Training Deep Neural Networks

A detailed description of neural networks can be found in the link below: https://www.investopedia.com/terms/n/neuralnetwork.asp

deep neural networks

Training Random Forests

A detailed description of random forests can be found in the link below: https://www.section.io/engineering-education/introduction-to-random-forest-in-machine-learning/

random forests

The Ensemble Model

This type of combines the predictions of a Neural Network & Random Forest. Typically, a well tuned Random Forest makes similar predictions with a Neural Network. However, there are some cases where these 2 model output different output probabilities (e.g. Random Forest might give higher probability that an outcome is Home). In that case, the ensemble model can be used which averages the output probabilities of both models and decides on the predicted outcome.

Evaluating Models

Before using a trained model, it is wise to first evaluate the model on unseen matches. This should reveal the quality of the model training, as well as its output probabilities. You can compare the probabilities of random forest with the neural network's probabilities and choose the most confident and well-trained model. Additionally, you can request an analytical report of the accuracy of the classifiers for specific odd intervals (e.g. the accuracy between 1.0 and 1.3, 1.3, and 1.6, etc., for the home or away team).

model evaluation

Outcome Predictions

In order to request a prediction for a match, You need to select the home/away team, as well as the book odds. You should use both models to make a prediction. If both models agree, then the prediction should probably be good. If the models disagree, then it's best to avoid betting on that match.

match predictions

Fixture Parsing

An alternative way to predict multiple matches at once is to use the "Fixture Parsing" option. When you click on that option, it will open the browser and ask you to download the specified fixture from footystats.org. This can be done by pressing Ctrl + S or right click and "Save As" option. Then, You will need to specify the filepath of the downloaded fixture and the application will automatically parse and predict the upcoming matches for you. You may also choose to export these predictions to a csv file, which you can open with Microsoft Excel.

fixture parsing & upcoming match prediction

Requirements

A requirements.txt file has been added to the project directory. However, the following table also presents the required libraries. Check the requirements.txt file for library versions.

Library/Module Download Url Installation
Python Language https://www.python.org/ Download from website
Numpy https://numpy.org/ pip install numpy
Pandas https://pandas.pydata.org/ pip install pandas
Matplotlib https://matplotlib.org/ pip install matplotlib
Seaborn https://seaborn.pydata.org/ pip install seaborn
Scikit-Learn https://scikit-learn.org/stable/ pip install scikit-learn
Imbalanced-Learn https://imbalanced-learn.org/stable/ pip install imbalanced-learn
XGBoost https://xgboost.readthedocs.io/en/stable/ pip install xgboost
Tensorflow https://www.tensorflow.org/ pip install tensorflow
Tensorflow-Addons https://www.tensorflow.org/addons pip install tensorflow_addons
TKinter https://docs.python.org/3/library/tkinter.html pip install tk
Optuna https://optuna.org/ pip install optuna
Fuzzy-Wuzzy https://pypi.org/project/py-stringmatching (https://pypi.org/project/fuzzywuzzy/) pip install fuzzywuzzy

To run pip commands, open CMD (windows) using Window Key + R or by typing cmd on the search. In linux, You can use the linux terminal.

Instructions (How to Run)

  1. Download & Install python. During the installation, you should choose add to "Path" It is recommended to download python 3.9.
  2. After you download & install python, you can Download the above libraries using pip module (e.g. pip install numpy). These modules can be installed via the cmd (in windows) or terminal (in linux). IMPORTANT: To download the correct versions, just add "==" after pip install to specify version, as described on requirements.txt file. For example, to install tensorlfow 2.9.1, you can use: pip install tensorflow==2.9.1.
  3. On windows, you can double click the main.py file. Alternatively (Both Windows & Linux), You can open the cmd on the project directory and run: python main.py.

Common Errors

  1. Cannot install tensorflow. Sometimes, it requires visual studio to be installed. Download the community edition which is free here: https://pypi.org/project/py-stringmatching
  2. pip command was not found in terminal. In this case, you forgot to choose add to Path option during the the installation of python. Delete python and repeat download instructions 1-3.
  3. File main.py was not found. This is because when you open command line (cmd) tool on windows, or terminal on linux, the default directory that cmd is looking at is the home directory, not prophitbet directory. You need to navigate to prophitbet directory, where the main.py file exists. To do that, you can use the cd command. e.g. if prophitbit is downloaded on "Downloads" folder, then type cd Downloads/ProphitBet-Soccer-Bets-Predictor and then type python main.py
  4. python command not found on linux. This is because python command is python3 on linux systems
  5. Parsing date is wrong when trying to parse fixtures from the html file. The html file has many fixtures. Each fixture has a date. You need to specify the correct date of the fixture you are requesting, so the parser identifies the fixtures from the given date and grab the matches. You need to specify the date before importing the fixture file into program.

Supported Platforms

  1. Windows
  2. Linux
  3. Mac

Open An Issue

In case there is an error with the application, open a Github Issue so that I can get informed and (resolve the issue if required).

Known Issues

  1. Neural Network's Training Dialog Height is too large and as a result, "Train" button cannot be displayed.

Solution: You can press "ENTER" button to start training. The same applies to Random Forest Training Dialog, as well as the tuning dialogs.

Release (2023/01/19)

  • Improved Graphical User Interface (GUI)
  • Added Custom Themes
  • Added "Ensemble" Model
  • Training can now start by pressing "ENTER" button
  • Added option for SVM-Smote resampling method (https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SVMSMOTE.html). It requires imbalanced-learn to be installed
  • Replaced py_stringmatching library, which was a bit confusing to install, with fuzzywuzzy (Check requirements.txt)
  • Fixtures are now imported, even if odds are missing. You can also manually add them or edit them
  • Fixed Bugs (Leagues not updating, Fixtures not being imported, etc.)
  • Added Weighting method to Random Forest.
  • Neural Networks may now have different activation, regularization or batch normalization option on each layer separately.
  • Added more metrics (F1-Score, Precision, Recall)
  • Tuning may now focus on maximizing F1-Score, Precision and Recall of a specified target (Home, Draw or Away).
  • Updated Documentation!

Release (2022/08/30)

  • Fixed a bug in Evaluation Filters
  • Fixed Fixture Parser
  • Added 2 new statistic features (columns): HGA, AGA
  • Neural Network now supports different noise ranges for each odd (1/x/2)
  • Neural Network may now add noise only to favorite teams (teams with odd < 2.0)

Release (2022/09/19)

  • Fixed a bug where several leagues would not be updated
  • Fixed a bug in evaluation filters

Release (2022/11/05)

  • Improved Model's Training
  • Added more training parameters, including, Dropout layers, Batch Normalization, Optimizers, Learning Rate, Regularizers
  • Model may now achieve higher accuracies
  • Added option to automatically search for best parameters, using OPTUNA package (Requires the installation of optuna, see instructions)
  • Updated Documentation: Added more detailed instruction + common errors and how they are dealt

Training Parameters

Release (2022/11/12)

  • Fixed a bug where leagues wouldn't be updated up to last day

Release (2023/02/19)

  • Smaller windows sizes
  • Better parameter selection for neural network tuning
  • Train Dialogs may now initiate training by hitting "ENTER" button
  • Small bug fixes

Release (2023/04/01)

  • Fixed a bug where model could not be saved during training
  • Fixed a bug where validation accuracy was not properly monitored during tuning
  • Increased number of available Trials to 2000
  • Added more options, including layers of neural network during training
  • Updated documentation

Contribution

If you liked the app and would like to contribute, You are allowed to make changes to the code and make a pull request! Usually, it takes 1-3 days for me to review the changes and accept them or reply to you if there is something wrong.

Donation

A lot of people request more football training data (e.g. corners, shots, cards, injuries, etc.). I found a football API that does provide such data https://footystats.org/api/

However, such data are not available for free. I would like your help to improve the quality of the training data that the algorithms use. Higher quality data should increase the performance of the algorithms and get better predictions in every league. Addiotioanlly, more options could be supported (e.g. Under 2.5 or Over 2.5, Num Corners, etc.). I made it available for everyone who liked to app and would like to contribute to donate any amount. The money will be spent on monthly subscription on footystats API, where I will be able to download a whole more data.

Donate

or via QR-Code:

Donation

Currently Donated Money

5.0€/20.0€

Top Donators

  • Agnieszka Fidura (5€)

More Repositories

1

TraderNet-CRv2

TraderNet-CRv2 - Combining Deep Reinforcement Learning with Technical Analysis and Trend Monitoring on Cryptocurrency Markets
Jupyter Notebook
29
star
2

Unet3-Plus

Clean Implementation of Unet3+ and validation on Cityscapes dataset.
Python
27
star
3

Autonomous-Vehicles-Adaptive-Cruise-Control

An implementation of an Autonomous Vehicle Agent in CARLA simulator, using TF-Agents
Python
18
star
4

Advanced-ML

Advanced Machine Learning Algorithms including Cost-Sensitive Learning, Class Imbalances, Multi-Label Data, Multi-Instance Learning, Active Learning, Multi-Relational Data Mining, Interpretability in Python using Scikit-Learn.
Jupyter Notebook
9
star
5

Physics-Informed-Neural-Network-PINN-Tensorflow

Implementation of a Physics Informed Neural Network (PINN) written in Tensorflow v2, which is capable of solving Partial Differential Equations.
Jupyter Notebook
9
star
6

Deep-RL-Frameworks

Comparison of different Deep Reinforcement Learning (DRL) Frameworks. This repository includes "tf-agents", "RLlib" and will soon support "acme" as well.
Jupyter Notebook
9
star
7

Deep-Trainer

Monitor Your Workout through a Webcam/IP Camera. No equipment is required, other than a camera and a laptop. This application could potentially replace a personal trainer, making it the idea app for workout.
PureBasic
8
star
8

Generative-Adversarial-Networks

Generation of Human-Like handwritten digits using different GAN Architectures. The models were developed using Low-Level Tensorflow.
Python
6
star
9

LogoLens

Logo Detection of a custom small dataset. The dataset contains logos of 6 famous brands: Nike, Jordans, Adidas, Puma, Kappa, Quicksilver. The model was developed using the Object-Detection-API by Tensorflow
Jupyter Notebook
5
star
10

LSTM-Stock-Predictions

Prediction of Stock price using Recurrent Neural Network (RNN) models. Contains GRU, LSTM, Bidirection LSTM & LSTM combinations with GRU units. The models were deveoped using the keras module from Tensorlfow.
Python
5
star
11

Tensorflow-MNIST-State-Of-The-Art

Building High Performance Convolutional Neural Networks with TensorFlow
Jupyter Notebook
4
star
12

Traffic-Sign-Classification

Apply built-in state-of-the-art classifiers with the Keras library to Traffic Sign datasets
Python
3
star
13

Big-Data-Algorithms

Implementation of algorithms for big data using python, numpy, pandas.
Python
3
star
14

Reinforcement-Learning-Algorithms

This project focuses on comparing different Reinforcement Learning Algorithms, including monte-carlo, q-learning, lambda q-learning epsilon-greedy variations, etc.
Python
3
star
15

VIT2

This repository is the implementation of the paper: ViT2 - Pre-training Vision Transformers for Visual Times Series Forecasting. ViT2 is a framework designed to address generalization & transfer learning limitations of Time-Series-based forecasting models by encoding the time-series to images using GAF and a modified ViT architecture.
Jupyter Notebook
3
star
16

Shadow-Hand-Controller

Construction of controllers for Shadow-Hand in Mujoco environment, using Deep Learning. 2 Different methods were used to create the controllers: a) Behavioral Cloning b) Deep Reinforcement Learning
Python
3
star
17

kochlisGit

This is my profile page.
2
star
18

Java-MultiThreading-Server-Chat

This is a simple Team Chat written in Java.
Java
2
star
19

Data-Augmentation-Algorithms

Implementation of fast Data Augmentation for Image Classification / Detection tasks.
Python
2
star
20

Data-Science-Algorithms

Implementation of statistics algorithms for Machine Learning & Data Mining. The algorithms were implemented with the Scikit-Learn Library
Python
2
star
21

Predictive-Maintainance-Tanzania-Water-Pumps

In this project, I analyze, plot and clean Tanzania's Water Pump Dataset, which is provided by DrivenData.org for a competition.
Jupyter Notebook
2
star
22

Noise-Adaptive-Driving-Assistance-System

Noise-Adaptive Driving Assistance System (NADAS) using Deep Reinforcement Learning, State-Estimation & State Representation
Jupyter Notebook
2
star
23

Wine-Preference-Analysis

The purpose of this work is the modeling of the wine preferences by physicochemical properties. Such model is useful to support the oenologist wine tasting evaluations, improve and speed-up the wine production. A Neural Network was trained using Tensorflow, which was later tuned in order to achieve high-accuracy quality predictions.
Jupyter Notebook
2
star
24

Computer-Vision-Algorithms

Implementation of computer vision algorithms and image processing using Numpy & OpenCV
Python
1
star
25

aboutme

In this repository, I will be hosting my personal webpage.
CSS
1
star
26

TMDB-Search-Engine

An implementation of an advanced movie search engine, using TMDB's data & Lucene's indexing. It is a desktop application, developed in Java
Java
1
star