• Stars
    star
    191
  • Rank 202,877 (Top 4 %)
  • Language
    Python
  • Created almost 7 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Machine learned bracketology

March-Madness-ML

Applying machine learning to March Madness. Check out my first repo here and my associated blog post. I've tried to make this repository extensible enough so that I can use it from year to year.

Overview

In this project, I hope to use machine learning to create a model that can predict the winner of a game between two teams. This way, I can try to predict the winner of the NCAA Basketball Tournament (and hopefully get a perfect bracket LOL). I've separated this project into a couple of different components. Since I like to do this every year, I wanted to keep this code general enough so that it can work from year to year, you'll just have to add new data for the current year.

  • Data: The Data folder contains different CSVs that show team stats, regular season game results, etc. It will contain data that I've scraped, data from Kaggle, and a folder that contains precomputed xTrain and yTrain matrices so that we don't have to keep recomputing the training set.
  • DataPreprocessing.py: Script where we create our training matrices.
  • MarchMadness.py: Script where we apply machine learning models to the training set. We can also create our Kaggle submissions here.

Requirements and Installation

  • python 3
  • pipenv for managing virtualenv and pip package dependencies.

What To Do Every March

  • Download data files from Kaggle, who will normally have a competition going (look for the competition for the current year). They will provide CSV files that show the results from games since 1985, information on conferences, tourney seed history, etc. It's important to download this data every year because Kaggle will add data from the most recently completed season and so you'll have a bit more training data. Download the files, and replace the ones in here with the new versions
  • We also want to get the advanced rating statistics from Basketball Reference. Basically, go to https://www.sports-reference.com/cbb/seasons/2019-ratings.html, replace 2019 with whatever year you're looking at, choose to get the table as a CSV (available in one of the dropdowns), disregard the first line, start with the line that begins with "Rk,School..", copy that over to a new text doc in Sublime (or any text editor), save it as a CSV, and then upload it to this folder.
  • We also want to get the regular season statistics from Basketball Reference. Basically, go to https://www.sports-reference.com/cbb/seasons/2019-school-stats.html, replace 2019 with whatever year you're looking at, choose to get the table as a CSV (available in one of the dropdowns), disregard the first line, start with the line that begins with "Rk,School..", copy that over to a new text doc in Sublime (or any text editor), save it as a CSV, and then upload it to this folder.
    • For both of the above steps, make sure that the column names are the same from year to year! In 2019, Basketball Reference made some small changes to the column names (X3P to 3PA for example)
  • Run DataPreprocessing.py in order to get the most up to date training matrices.
  • Run MarchMadness.py.

What You Can Do

  • Try to modify MarchMadness.py to include more ML models
  • Modify DataPreprocessing.py to create different features to represent each game/team
  • Perform data visualizations to see which features are the most important
  • Decide what type of additional data preprocessing might be needed

Getting Started

  1. Download and unzip this entire repository from GitHub, either interactively, or by entering the following in your Terminal.
    git clone https://github.com/adeshpande3/March-Madness-ML.git
  2. Navigate into the top directory of the repo on your machine
    cd March-Madness-ML
  3. Create a virtualenv and install the package dependencies. If you don't have pipenv, you can follow instructions here for how to install.
    pipenv install
  4. First create your xTrain and yTrain matrices by running
    pipenv run python DataPreprocessing.py
    This may take a while (Still trying to figure out ways to make this faster).
  5. Then run your machine learning model
    pipenv run python MarchMadness.py

Troubleshooting

  • If you're using Python 2, then everything should be the same except you don't have to create a pipenv, but you would have to install the following libraries on your own: numpy, pandas, sklearn. Other optional libraries are keras, tensorflow, and xgboost.
  • If you are using the pipenv with Python 3.7 approach and you want to use Tensorflow, you might run into issues with versioning like this one. The tl;dr is to use Python 3.6 instead of 3.7.
  • If you are getting errors with any Tensorflow, Keras, or Xgboost installation, keep in mind that those aren't completely necessary for being able to run MarchMadness.py. They are just helpful for if you want to create neural network models (Tensorflow/Keras) or if you want to run Gradient Boosted models (Xgboost). If you are getting errors and you don't really want to use those models, you can go ahead and remove those import lines.

More Repositories

1

Tensorflow-Programs-and-Tutorials

Implementations of CNNs, RNNs, GANs, etc
Jupyter Notebook
1,046
star
2

Machine-Learning-Links-And-Lessons-Learned

List of all the lessons learned, best practices, and links from my time studying machine learning
986
star
3

LSTM-Sentiment-Analysis

Sentiment Analysis with LSTMs in Tensorflow
Jupyter Notebook
959
star
4

Facebook-Messenger-Bot

Facebook chatbot that I trained to talk like me using Seq2Seq
Python
705
star
5

Generative-Adversarial-Networks

Tutorial on GANs
Jupyter Notebook
285
star
6

UCLA-Course-Notes

My notes from the classes I've taken at UCLA
101
star
7

Pandas-Tutorial

Tutorial on Using Pandas
Jupyter Notebook
79
star
8

March-Madness-2017

Kaggle Competition for Predicting NCAA Basketball Tourney Games
Jupyter Notebook
70
star
9

Chatbot-Flask-Server

The Flask server that communicates with my FB Messenger chatbot
Python
58
star
10

Machine-Learning-Notes

Notes for several Machine Learning and Deep Learning courses, textbooks, and talks
56
star
11

MachineLearningReimplementations

Just trying to reimplement machine learning techniques like linear regression, neural nets, logistic regression, etc in Python
Python
32
star
12

NLP-Stuff

Programs with word vectors, RNN, NLP stuff, etc
Jupyter Notebook
18
star
13

Quandl-Machine-Learning

Using the Quandl API to (try) to apply ML to the stock market
Jupyter Notebook
13
star
14

NBA-Data-Visualization

Visualizations of statistics for NBA players
Jupyter Notebook
12
star
15

MLB_Win_Predictor

Neural network that predicts the number of wins for a baseball team based on the importance of different statistical categories and their influence on a team's success.
Lua
11
star
16

Music-Analysis

Trying to apply deep learning to music analysis
Jupyter Notebook
11
star
17

Kaggle-Zillow

Repo for the $1.2 million Kaggle competition
Jupyter Notebook
11
star
18

PyTorch-Programs

Trying out PyTorch because the hype is real
Jupyter Notebook
10
star
19

ReinforcementLearning

A collection of simple RL agents in different environments
Python
10
star
20

Job-Descriptions-Hacker-Rank-Comp

Hackerrank ML Competition
Jupyter Notebook
9
star
21

Kaggle-MNIST

Simple ConvNet to classify digits from the famous MNIST dataset
Python
9
star
22

KaggleTitanic

Kaggle Titanic Survival ML Competition
Python
6
star
23

WalmartLabs-ML-CodeSprint

Hackerrank ML Contest
Python
6
star
24

Two-Sigma-Renting-Listing-Competition

Kaggle Competition for Predicting Interest of New Rental Listings
Jupyter Notebook
6
star
25

OpenAI_Gym_Universe

Playing around with Open AI's reinforcement learning frameworks
Python
6
star
26

MLB

MLB Web App using React
JavaScript
6
star
27

CS-145-Project

CS 145 Yelp Project
Python
4
star
28

Tech-Plus-You-March-Madness

My Machine Learning + March Madness demo for UCLA's Tech+You event
Python
3
star
29

KaggleGhosts

Kaggle Ghosts ML Competition
Jupyter Notebook
2
star
30

Accelerometer-Pogram

Program that reads in accelerometer data (x, y, z direction values), trains a neural network program for classification, and then outputs predictions for a set of test data
Lua
2
star