Giter VIP home page Giter VIP logo

march-madness-ml's Introduction

March-Madness-ML

Applying machine learning to March Madness. Check out my first repo here and my associated blog post. I've tried to make this repository extensible enough so that I can use it from year to year.

Overview

In this project, I hope to use machine learning to create a model that can predict the winner of a game between two teams. This way, I can try to predict the winner of the NCAA Basketball Tournament (and hopefully get a perfect bracket LOL). I've separated this project into a couple of different components. Since I like to do this every year, I wanted to keep this code general enough so that it can work from year to year, you'll just have to add new data for the current year.

  • Data: The Data folder contains different CSVs that show team stats, regular season game results, etc. It will contain data that I've scraped, data from Kaggle, and a folder that contains precomputed xTrain and yTrain matrices so that we don't have to keep recomputing the training set.
  • DataPreprocessing.py: Script where we create our training matrices.
  • MarchMadness.py: Script where we apply machine learning models to the training set. We can also create our Kaggle submissions here.

Requirements and Installation

  • python 3
  • pipenv for managing virtualenv and pip package dependencies.

What To Do Every March

  • Download data files from Kaggle, who will normally have a competition going (look for the competition for the current year). They will provide CSV files that show the results from games since 1985, information on conferences, tourney seed history, etc. It's important to download this data every year because Kaggle will add data from the most recently completed season and so you'll have a bit more training data. Download the files, and replace the ones in here with the new versions
  • We also want to get the advanced rating statistics from Basketball Reference. Basically, go to https://www.sports-reference.com/cbb/seasons/2019-ratings.html, replace 2019 with whatever year you're looking at, choose to get the table as a CSV (available in one of the dropdowns), disregard the first line, start with the line that begins with "Rk,School..", copy that over to a new text doc in Sublime (or any text editor), save it as a CSV, and then upload it to this folder.
  • We also want to get the regular season statistics from Basketball Reference. Basically, go to https://www.sports-reference.com/cbb/seasons/2019-school-stats.html, replace 2019 with whatever year you're looking at, choose to get the table as a CSV (available in one of the dropdowns), disregard the first line, start with the line that begins with "Rk,School..", copy that over to a new text doc in Sublime (or any text editor), save it as a CSV, and then upload it to this folder.
    • For both of the above steps, make sure that the column names are the same from year to year! In 2019, Basketball Reference made some small changes to the column names (X3P to 3PA for example)
  • Run DataPreprocessing.py in order to get the most up to date training matrices.
  • Run MarchMadness.py.

What You Can Do

  • Try to modify MarchMadness.py to include more ML models
  • Modify DataPreprocessing.py to create different features to represent each game/team
  • Perform data visualizations to see which features are the most important
  • Decide what type of additional data preprocessing might be needed

Getting Started

  1. Download and unzip this entire repository from GitHub, either interactively, or by entering the following in your Terminal.
    git clone https://github.com/adeshpande3/March-Madness-ML.git
  2. Navigate into the top directory of the repo on your machine
    cd March-Madness-ML
  3. Create a virtualenv and install the package dependencies. If you don't have pipenv, you can follow instructions here for how to install.
    pipenv install
  4. First create your xTrain and yTrain matrices by running
    pipenv run python DataPreprocessing.py
    This may take a while (Still trying to figure out ways to make this faster).
  5. Then run your machine learning model
    pipenv run python MarchMadness.py

Troubleshooting

  • If you're using Python 2, then everything should be the same except you don't have to create a pipenv, but you would have to install the following libraries on your own: numpy, pandas, sklearn. Other optional libraries are keras, tensorflow, and xgboost.
  • If you are using the pipenv with Python 3.7 approach and you want to use Tensorflow, you might run into issues with versioning like this one. The tl;dr is to use Python 3.6 instead of 3.7.
  • If you are getting errors with any Tensorflow, Keras, or Xgboost installation, keep in mind that those aren't completely necessary for being able to run MarchMadness.py. They are just helpful for if you want to create neural network models (Tensorflow/Keras) or if you want to run Gradient Boosted models (Xgboost). If you are getting errors and you don't really want to use those models, you can go ahead and remove those import lines.

march-madness-ml's People

Contributors

adeshpande3 avatar breadbored avatar dependabot[bot] avatar jacklabarba avatar rshtirmer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

march-madness-ml's Issues

Free Throws, Blocks and Personal Fouls

I noticed that free throws, blocks and personal fouls are not gathered by getSeasonData().
I'm not 100% familiar with the algorithm yet but I figured I'd add them to the list and see what happens. I am curious though why these were emitted from the start and I figured you guys might have a reason.

On another note much of the missing columns like TOV, Opp., ORB, MP and others have been added to sports-reference, I'm working on importing the data into the CSVs and I am seeing a slight accuracy boost.

2022 Revamp & 2020 issues

Hey, I'm new to ML, trying to get my feet wet here. Can we get this updated to at least skip 2020(no tournament)? The program won't run without 2020 data.
I have formatted the College Basketball Reference data for 2021 and 2022 and would love to share if we can get this to work again :)

This project is awesome! I'd love to help make it better

Incorrect Shape / Data Processing Issue

After running through DataProcessing with the input "2019," I receive the following output:

('Finished year:', 2019)    
('Shape of xTrain:', (0, 17))  
('Shape of yTrain:', (0,))

Expectedly, running MarchMadness.py on this data set results in the following:
ValueError: Found array with 0 sample(s) (shape=(0, 17)) while a minimum of 1 is required.

Best,
Ryan

Python Version

I'm completely new to pretty much everything in your article but I decided to give it a go. I downloaded Python 3.7.2 for Windows and then followed the instructions on the ReadMe. Started running into issues on the MarchMadness.py step with "ModuleNotFoundError: No module named 'tensorflow'". After some digging, it appears that the stable version of TensorFlow is not compatible with Python 3.7. I did see that I could use the TF nightlies to use Python 3.7 but given my level of experience, that seemed like a rabbit hole I didn't want to go down. So I uninstalled 3.7 and installed 3.6. Then I ran into an issue with the pipfile specifying Python 3.7. I changed that to 3.6 and everything is working out.

Would be great if the ReadMe could be updated to clarify the Python 3 installation requirement to avoid any issues like I did. Of course, that would mean that some other complete newbie like myself who is actually interested in March Madness, machine learning, on Windows, and who didn't already have Python installed...

And yes, I'm sure that I am supposed to update the ReadMe and issue a Pull request myself, but I haven't done anything with git since 2013 and even then it was with a lot of hand holding.

The good news is that I am successfully creating a new training set! Thanks for an informative blog post and creating the project here.

Error during traceback

Traceback (most recent call last): File "DataPreprocessing.py", line 11, in <module> import pandas as pd ImportError: No module named pandas
Help? Thanks. Look forward to using this.

replit

can you please run this on replit?

Illegal Instruction 4 when running MarchMadness.py

I followed your instructions to get everything setup. I was able to run the DataPreprocessing.py fine, but when I try and run the MarchMadness script, I'm getting an execution error. Any idea what the issue might be? I'm not real familiar with debugging python.

Shape of xTrain: (131372, 17)
Shape of yTrain: (131372,)
MacBook-Pro:March-Madness-ML$ pipenv run python MarchMadness.py 
Using TensorFlow backend.
Illegal instruction: 4

All predictions are identicle (for me). That seems unlikely.

I am running the version you posted just recently, with the pipenv stuff and Python 3.7.2. When I run MarchMadness.py the sample results for the East bracket always chooses team2 as the winner, and the probabilities for all games are the same. Is this expected behavior?

Here is my run:
kcason@ubuntu:~/Desktop/newer/March-Madness-ML[kcason@ubuntu March-Madness-ML]$ pipenv run python MarchMadness.py
Using TensorFlow backend.
Shape of xTrain: (126047, 17)
Shape of yTrain: (126047,)
What year are these predictions for?
2019
Starting run #0:
Finished run #0:
Accuracy = 0.7552995684183803
Time taken: 0:00:30.114437

Starting run #1:
Finished run #1:
Accuracy = 0.7573940086316324
Time taken: 0:00:31.292216

Starting run #2:
Finished run #2:
Accuracy = 0.7557438436151307
Time taken: 0:00:29.775752

Starting run #3:
Finished run #3:
Accuracy = 0.7544427519675044
Time taken: 0:00:31.763139

Starting run #4:
Finished run #4:
Accuracy = 0.7525069814673775
Time taken: 0:00:29.563869

The average accuracy is 0.755077430820005

Loaded the team vectors
/home/kcason/.local/share/virtualenvs/March-Madness-ML-XxyoxCtg/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
Loaded the team vectors
/home/kcason/.local/share/virtualenvs/March-Madness-ML-XxyoxCtg/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)

Probability that NC Central wins over Duke: 0.5168695665038832
Probability that UCF wins over VA Commonwealth: 0.5168695665038832
Probability that Liberty wins over Mississippi St: 0.5168695665038832
Probability that St Louis wins over Virginia Tech: 0.5168695665038832
Probability that Belmont wins over Maryland: 0.5168695665038832
Probability that Yale wins over LSU: 0.5168695665038832
Probability that Minnesota wins over Louisville: 0.5168695665038832
Probability that Bradley wins over Michigan St: 0.5168695665038832

MarchMadness.py error

Traceback (most recent call last):
File "MarchMadness.py", line 22, in
from keras.utils import np_utils
File "C:\Users\Nick Jwaida\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\keras_init_.py", line 21, in
from keras import models
File "C:\Users\Nick Jwaida\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\keras\models_init_.py", line 18, in
from keras.engine.functional import Functional
File "C:\Users\Nick Jwaida\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\keras\engine\functional.py", line 24, in
import tensorflow.compat.v2 as tf
File "C:\Users\Nick Jwaida\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tensorflow_init_.py", line 37, in
from tensorflow.python.tools import module_util as _module_util
ModuleNotFoundError: No module named 'tensorflow.python'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.