Giter VIP home page Giter VIP logo

deving789 / nba_final-project Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 3.0 54.09 MB

Using Python, R, and SQL with the 2014-15 NBA season data set. Our project imports the data set, merges with other files for cleaning & processing then puts the material into a machine learning algorithm

Home Page: https://deving789.github.io/NBA_Final-Project/

Jupyter Notebook 79.14% R 0.21% CSS 2.44% JavaScript 11.13% HTML 7.08%
python pandas html javascript nba-analytics sql shot-logs players-stats correlations

nba_final-project's Introduction

NBA_Final-Project

Project Goal

Our goal is to determine if there is any correlation between our features(SHOT_DIST,CLOSE_DEF_DIST,Height(CD)..) selected and if the player is successful or in not making the basket(SHOT_RESULT). If there is a correlation, we would like to see what other features from different data not in our current set have a strong correlation to the shot_result output such as advanced player metrics. We selected this topic because we want to see what the potential outcomes are based on the data we have chosen. Furthermore, we feel that based on this data we can predict how certain players will perform by the spacing they create from the nearest defender.


Questions we want to answer

  • Is there strong correlations between defender height and a shot being missed?
  • Is there a strong correlation for a player making a basket if the closest defender is more than 6 feet away from the shooter?
  • Is there a connection between being on the home team for star players to have a better shooting percentage?
  • Do we need to add more data to get a high accuracy score when using a machine learning model?

Group communication protocols

  • Throughout this project the group has constantly made contact through the slack app.
  • This includes sharing information that we find online, code & arranging meetings at least once a week.

Team's checklist/tracker

Team Responsibilties
Devin - project leader
Brian - contributor (circle)
Larysa - contributor (triangle)


Getting started

We began the project by importing the shots_log.csv and the players_stats.csv into our jupyter notebook(project_x). We noticed that the MATCHUP column had data that would be tough to sort because it was not in the correct format. In order to change this we used to methods to clean up the column .str.split and .to_datetime. Once the data was in the correct format we were able to complete our first merge between the two csv's.

Screen Shot 2021-01-19 at 9 48 12 AM

Exploratory Data

For the data portion of this project we used SQL, R & Python. If you take a look at our project_x file, what can be seen is our merging of the shots_log and players_stats data sets. Why did we do this? The shots log data set only shows the height, weight of the shooter. That is not enough for the reader to see the whole story. After our merge we are able to see all of the physical information about the defender(Height, Weight etc.)***(show merge)

The main library we used with python was pandas to import, clean and merge the data. With our merged data we are able to search for correlations to see what features had an impact on the shot result column. After our merge we decided to look at team statistics such as PACE, offensive efficiency and many other metrics.


Data Analysis & Machine Learning

Screen Shot 2021-01-21 at 1 41 42 PM

Per our observations, there is no real correlation between defender height and a shot being made. What we can find is there seems to be a good coorelation between being the home team and having a higher shooting percentage.

Resampling method -- first try with merged data sets shots_log and player_stats

Initial resampling

For the machine learning part of the project we used the imbalanced-learn library with a sampling method(OverSampler). We chose the oversampling method because we figured this would be a good way to start. This may change once we complete the project. The benefits of the random oversampler is that it rebalances the distribution in an imbalanced set. Before beginning our process we had to split the data into train and test sets with our Y variable being the SHOT_RESULT column The benefits of using this model is that it makes our dataset more balanced

Easy Ensemble Classifier with Forest Classifier

For the second try, we decided to use this method to test our data set with machine learning. First we split our data into train and test sets before chosing our balanced random forest classisifer model. Although this model is used typically for an imbalanced variable we thought that it could potentially increase our accuracy score so we thought it was neccessary to use it. Unfortunately there was not an increase in the accuracy score, the only significant correlation to our shot_result column was the CLOSE_DEF_DIST and 3 point attempts.

For more information click below


Database

The team connected pandas with SQL and created databse called NBA_DB that can store the working data.

More details


Project dashboard

-- Using HTML we created a website for the end user to look at. On this HTML page you can see the names of our group members, the technologies we used, a carousel of images(including from our project) and even a section where you can look up any player or team from the 2014-15 season. The search section even allows you to download your searched critera via JSON or Excel. To see our website please scroll down to the bottom of this readme file.


Conclusion

The result of a players shot is extremely tough to predict -- overall throughout NBA history the average field goal percentage has always been below 50%. Even if a player has a wide open bucket it is not a gurantee that they will make it. Our conclusion from the 2014-15 season is for a team to get their best results they need to create as many fast break points as they can get and take as many wide open 3 point looks as possible. The Warriors forever broke the mold of the average superstar team by running, spreading the floor and shooting a high volume of 3 pointers to maxmize their number of possesions, scoring efficiency resulting in points that created more wins and a path to a championship season.


Resources

Dataset that with 2014-15 NBA shot log data and player stats.

Data Source: Kaggle


Presentation

Take a look at our website!

nba_final-project's People

Contributors

ahnbr avatar deving789 avatar jojobear2020 avatar

Stargazers

 avatar

Watchers

 avatar

nba_final-project's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.