Giter VIP home page Giter VIP logo

fborrasumh / football_analytics Goto Github PK

View Code? Open in Web Editor NEW

This project forked from eddwebster/football_analytics

0.0 1.0 0.0 606.53 MB

βš½πŸ“Š A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster), with links to a curated list of publicly available resources published by the football analytics community.

Jupyter Notebook 99.20% Python 0.80%

football_analytics's Introduction

Football Analytics

A public space for football analytics projects by Edd Webster, including a curated list of publicly available resources published by the football analytics community.


Edd Webster Football Analytics

-----------------------------------------------------

πŸ‘‹ About This Repository and Author

Edd Webster

Please note, all the code and analysis produced in this repository is mine and/or credited to the publicly produced code, data, and/or libraries used, and is in no way related to the work and analysis I produce for my employers.

I recently rewrote this README to include links not only to my own work, but also to include a concise list of learning resources, data sources, libraries, papers, blogs, podcasts, etc., created by all those that have made contributions to the football analytics community. This will be a constant work in progress so if you can think of any resources that I've missed, or you yourself have created something that you believe should be added and is currently not available, please feel free to create a pull request or send me a message.

Credits to the Soccer Analytics Handbook by Devin Pleuler, Awesome Soccer Analytics by Matias Mascioto, and Jan Van Haaren's Soccer Analytics 2020 Review, which were all used to plug gaps in the list once it was published. Credit also to Matias Singers for his awesome-readme repository used to restyle this README.

If you like the repo, please feel free to give it a ⭐ (top right). Cheers!

For more information about this repository and the author, I am available through all the following channels:

Personal Website Badge Email Badge Twitter Badge LinkedIn Badge About.me Badge GitHub Badge HackerRank Badge Tableau Badge

-----------------------------------------------------

πŸ“– Table of Contents

Table of Contents
  1. ➀ About This Repository and Author
  2. ➀ Table of Contents
  3. ➀ Prerequisites
  4. ➀ Repository Structure
  5. ➀ Notebooks
  6. ➀ Data Visualisation and Tableau
  7. ➀ Data Sources
  8. ➀ Resources
  9. ➀ Contributing
  10. ➀ Acknowledgements

-----------------------------------------------------

🍴 Prerequisites

made-with-python
Made withJupyter

The only knowledge prerequisites for using this GitHub repo is that you have a computer, internet connection and the desire to learn more about football analytics.

The following open-source Python libraries listed below are some of the most commonly used in Data Science that feature in the the notebooks in this repository. Most of these libraries can be obtained by downloading and installing Anaconda. Step-by-step guides to do this can be found for Windows here and Mac here, as well as in the Anaconda documentation itself here.

-----------------------------------------------------

🌡 Repository Structure

The contents of this GitHub repository is organised as the following:

football analytics github repository
.
β”‚
β”œβ”€β”€ dashboards
β”‚
β”œβ”€β”€ data
β”‚
β”œβ”€β”€ documentation
β”‚
β”œβ”€β”€ gif
β”‚
β”œβ”€β”€ img
β”‚
β”œβ”€β”€ notebooks
β”‚   β”œβ”€β”€ 1_data_scraping
β”‚   β”‚
β”‚   β”œβ”€β”€ 2_data_parsing
β”‚   β”‚
β”‚   β”œβ”€β”€ 3_data_engineering
β”‚   β”‚
β”‚   β”œβ”€β”€ 4_machine_learning
β”‚   β”‚
β”‚   β”œβ”€β”€ 5_data_analysis_and_projects
β”‚   β”‚
β”‚   β”œβ”€β”€ 6_data_visualisation
β”‚   β”‚
β”‚
β”œβ”€β”€ research
β”‚
β”œβ”€β”€ scripts
β”‚
β”œβ”€β”€ spreadsheets
β”‚
β”œβ”€β”€ video
β”‚ 

-----------------------------------------------------

πŸ“” Notebooks

Nearly all code in this repository is in Jupyter notebooks, organised in the following workflow:

  1. Webscraping;
  2. Data Parsing;
  3. Data Engineering;
  4. Machine Learning;
  5. Data Analysis - projects include working with Tracking data, constructing VAEP models (as introduced by SciSports), building xG models using Logistic Regression, Random Forests and Gradient Booested Decision Tree algorithms such as XGBoost and CatBoost, and analysing player similarity using PCA and Factor Analysis (TBA);
  6. Data Visualisation - examples of how to create some of the most visualisations using Python and Tableau.

I am in the process of giving this a quick tidy up, but the notebooks are clearly labeled and include a lot of useful code and analysis.

-----------------------------------------------------

πŸ“Š Data Visualisation and Tableau Dashboards

For Tableau dashboards produced using the data engineered in the notebooks in this repository, please see my Tableau Public profile: public.tableau.com/profile/edd.webster.

Example Tableau dashboards:

-----------------------------------------------------

πŸ’Ύ Data

The following data sources have been used in this repository. Due to the 100mb file size limitation in GitHub, all engineered datasets prepared in this repository have been exported and made publicly available to view and download in Google Drive. Please see the following [link]. However, all code in this repository should enable you to scrape, parse, and engineer the datasets to the format in which I have analysed and visualised the data in this repo.

Data sources featured in this repository include:

-----------------------------------------------------

πŸ“‘ Resources

πŸ“‘ Getting Started with Football Analytics

Good resources for those new for the use of data in football:

πŸ§‘β€πŸŽ“ Tutorials

Python

R

Tableau

For a YouTube playlist of Tableau-football videos and tutorials that I have collated from various sources including the Tableau Football User Group, Rob Carroll, and Tom Goodall, see the following [link].

Tableau

Excel

PowerPoint

πŸ’Ύ Data

ℹ️ Data Sources

All publicly available data sources and datasets relating to football, from Tracking data, Event data, aggregated player performance data, detailed match statistics, injury records and transfer values, and more.

To learn more about the different types of data available, such as Event and Tracking data, please see Devin Pleuler's soccer_analytics_handbook.

πŸ“„ Documentation

All documentation saved locally in the documentation subfolder, including:

Data Companies

Data Providers
Tracking
Videos / Performances Analysis

πŸ›οΈ Libaries

Python

  • codeball - data driven tactical and video analysis of soccer games;
  • Football Packing - a Python package to calculate packing rate for a given pass in football by Samira Kumar. This is a variation of the metric created by Impect;
  • kloppy - a Python package providing (de)serializers for soccer tracking- and event data, standardized data models, filters, and transformers designed to make working with different tracking- and event data like a breeze. See the YouTube tutorial [link];
  • matplotsoccer - a Python library for visualising soccer event data by Tom Decroos;
  • mplsoccer - a Python library for drawing soccer/football pitches in Matplotlib and loading StatsBomb open-data by Andrew Rowlinson;
  • nayra - API that allows you track soccer player from camera inputs, and evaluate them with an Expected Discounted Goal (EDG) Agent. See the Evaluating Soccer Player paper by Paul Garnier and ThΓ©ophane Gregoir;
  • northpitch - a Python football plotting library that sits on top of Matplotlib by Devin Pleuler;
  • PCA_Player_Finder by Parth Athale;
  • PySport including PySport Soccer - collection of open-source sport packages including many of those mentioned in this section, by Koen Vossen;
  • PyWaffle - an open source, MIT-licensed Python package for plotting waffle charts by Peter McKeever;
  • ScraperFC - a Python package to scrape data from FBRef, Understat and FiveThirtyEight by Owen Seymour;
  • Scrape-FBref-data - Python library to scrape StatsBomb data via FBref by Parthe Athale, which in turn was updated from Christopher Martin's repository;
  • statsbombapi - a Python API wrapper and dataclasses for Statsbomb data;
  • statsbombpy - a Python library written by Francisco Goitia to access StatsBomb data;
  • statsbomb-parser - Python library to convert StatsBomb's JSON data into easy-to-use CSV format;
  • socceraction - a Python library for valuing the individual actions performed by soccer players. Includes an Expected Threat (xT) implementation by Tom Decroos et. al.;
  • soccermix - a soft clustering technique based on mixture models that decomposes event stream data into a number of prototypical actions of a specific type, location, and direction by Tom Deccoos and ML-KULeuven;
  • soccer_xg - a Python package for training and analyzing expected goals (xG) models in football;
  • soccerplots - a Python package that can be used for making visualizations for football analytics by Anmol Durgapal;
  • sync.soccer - a Python package to synchronise football datasets, so that an event in one dataset is matched to the corresponding event or snapshot in the other by Marek Kwiatkowski. This repository contains an implementation that aligns Opta's (now STATS Perform's) F24 feeds to ChyronHego's Tracab files. More formats may be added in the future. See the following blog post for methodology [link];
  • tmscrape - a Python TransferMarkt webscraper by danzn1;
  • Tyrone Mings - a Python TransferMarkt webscraper by FCrSTATS;
  • understat - a Python webscraper by Amos Bastian.

R

  • ggsoccer - a soccer visualisation library in R from Ben Torvaney;
  • soccerAnimate - an R package to create 2D animations of soccer tracking data;
  • soccermatics - an R package for the visualisation and analysis of soccer tracking and event data by Joe Gallagher;
  • worldfootballR - a R package to allow users to extract various world football results and player statistics data from FBref and valuations and transfer data from TransferMarkt.com by Jason Zivkovic (see guide on how to use this package [link]); and
  • understatr - a R package to scrape data from Understat.

GitHub Repositories

Python

R

Apps

πŸ“Š Data Visualisation Resources and Tools

Resources to aid data visualisation:

βœ’οΈ Written Pieces

Blogs

Many of these blog posts are recommended in Sam Gregory's Best Football Analytics Pieces piece and Tom Worville's β€œWhat’s the best Football Analytics piece you’ve ever read?”, both articles now a few years old. This section is very subjective so if I've missed anything obvious, apologies.

Blogs and Data Analytics Websites

πŸ“ƒ Papers

The following Shiny App from Lars Maurath is a great tool for looking up publications [link].

2021
2020
2019

2018

2017
2016
2015
2014
2011
2002
1997
1971

Newsletters

News Articles

πŸ“š Books

The following use Amazon UK links were available.

Magazines

πŸ“Ό Video

YouTube Playlists

Custom Playlists Curated by Myself

The following is a series of playlists that that I have collated originally for my own personal viewing but they may be useful to you:

Public Playlists

Playlists created by others

YouTube Channels

Video Analysis

Webinars and Lectures

Ted Talks

Documentaries

Match Highlights

Other

πŸ”Š Podcasts

Below I've tried to include both the Sports/Football Analytics and then notable episodes of all podcasts that have analytical content/interviews. Spotify and YouTube links used where available. All episodes mentioned below that are available on Spotify can be found in the following playlist (updated periodically): [link].

Football Analytics Podcasts

Notable Episodes (including non-football-data-specific podcasts)

πŸ‘¨β€πŸ’» Notable Figures and Twitter Accounts

Career Advice

πŸ—“οΈ Events and Conferences

Competitions

The following includes non-football competitions.

Courses

πŸ’Ό Jobs

Discord/Slack groups

πŸ”‘ Key Concepts

Focus on some of the key topics in football analytics. Most of the following resources features above but are instead reorganised by topic. This section is still very much a work in progress as I go along and may be missing resources mentioned above.

History of Football Analytics

Expected Goals (xG) Modeling

Videos

For a playlist of Expected Goals related videos available on YouTube, see the following playlist I have created [link].

Webinars and Lectures
Tutorials
Notable Models
Written Pieces

For a collated list of Expected Goals literature collated by Keith Lyons, see the following [link]

Libraries
GitHub Repositories
Podcasts
Tweets

Tracking Data

Pitch Control Modeling

Tutorials

Pitch Control modelling and Valuing Actions tutorials by Laurie Shaw as part of his Metrica Sports Tracking data series for Friends of Tracking. See the following for code [link];

GitHub Repositories
Written Pieces
Video
Podcasts

Possession Value (PV) Frameworks

General
Expected Threat (xT)
Valuing Actions by Estimating Probabilities (VAEP)
Goals Added (g+)

Player Similarity Analysis

Reinforcement Learning for Football Simulation

Set Pieces

Section created after seeing the following tweets and threads by Ashwin Raman ([link]) and Stuart Reid ([link])

❔ Miscellaneous

-----------------------------------------------------

Contributing

This GitHub repository and resources list will be a constant work in progress so if you can think of any resources that I've missed, feel free to create a pull request or send me a message @ [email protected] or @eddwebster.

-----------------------------------------------------

Acknowledgements

football_analytics's People

Contributors

eddwebster avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.