scrapeOP
đ đ đ A python package for scraping oddsportal.com đ đ đ
This repository contains :
- A set of functions to scrape whatever league you wish :
- Clone the repository or download it
- Set your chrome driver location at line 20 in functions.py. NB : path is written with double slashes
- Open FinalScraper.py and use one of the functions to scrape, that is as simple as this!
âšī¸ Functionalities :
- Multiple sports supported : soccer, basketball, esports, darts, tennis, baseball, rugby, american football, hockey [list to be expanded soon!]
- Mutiple functionalities : collect historical odds, current season only, upcoming games, specific season only
- Collects all available bookmakers odds for each game
- Collects the final result
- Automatically sort the data by date
- 14/11/2020 : You can now have the possibility to scrape the opening odds instead of the closing odds. To do this, just change the line 26 in functions.py to 'OPENING'
Sport | Historical data (multiple seasons) | Current Season only | Specific Season | Next Games |
---|---|---|---|---|
American Football | âī¸ | |||
Baseball | ||||
Basketball | âī¸ | âī¸ | âī¸ | âī¸ |
Darts | âī¸ | âī¸ | ||
eSports | âī¸ | âī¸ | âī¸ | |
Handball | ||||
Hockey | âī¸ | |||
Rugby | âī¸ | |||
Soccer | âī¸ | |||
Tennis | âī¸ | âī¸ | ||
Volleyball |
1.scrape_oddsportal_historical(sport = 'soccer', country = 'france', league = 'ligue-1', start_season = '2010-2011', nseasons = 5, current_season = 'yes', max_page = 25)
2.scrape_oddsportal_current_season(sport = 'soccer', country = 'finland', league = 'veikkausliiga', season = '2020', max_page = 25)
3.scrape_oddsportal_specific_season(sport = 'soccer', country = 'finland', league = 'veikkausliiga', season = '2019', max_page = 25)
4.scrape_oddsportal_next_games(sport = 'tennis', country = 'germany', league = 'exhibition-bett1-aces-berlin-women', season = '2020')
..then console when running code :
..and finally the scraped data, saved in .csv format :
Extra-documentation can be found for the functions in the functions.py script.
- Medium article : https://medium.com/analytics-vidhya/how-covid-19-prevented-me-from-being-a-millionnaire-in-2020-5b2144e8bdef
- Full paper : https://seb943.github.io/Data/Paper_Exploiting_bookmakers_biases.pdf
You can also have a look at the functions.py source code in order to understand the mechanics and eventually adapt the code to your own purpose. In the functions.py script, I distinguished 4 types of sports, according to the sport-related format of outcome (either 1X2, 12, and various types of score : tennis-alike, football-alike, baseball-alike, hockey-like (the format is different for hockey on oddsportal website) ).
With the emergence of sports analytics and machine learning, it has become possible for anyone to create data-based betting strategies, taking into considerations both market-related figures (odds values, variations, differentials between bookmakers) and sport-related metrics of performance of any team. In order to perform this task, the very minimal data required is the historical results and betting odds (closing odds are usually preferred) which you can then use to create Machine Learning and Deep Learning models to infer probabilities of victories, and to analyze whether any given team is being undervalued or overvalued by any given bookmaker. The oddsportal website is one of the largest publicly open odds database, however its format and architecture are not very pleasing to deal with, therefore one need a bit of time to build tools to collect the data on their website. This package offers a comprehensive interface (sort of unofficial API) to collect odds and save the data into a comprehensive csv format.
[1] https://www.oddsportal.com/
NB : This package is purposed for educational use only, not for any commmercial purpose in any way. I am not related by any mean with the oddsportal website.
BTC : 3PkoHLXmXsL8kBrFu7GQ8kpmzPBmNK6m8B
ETH : 0xFdbB5aF291cB7e711D62c1E4a8B58d0EbD423F9C