floodlight-sports / floodlight Goto Github PK

View Code? Open in Web Editor NEW

54.0 3.0 14.0 4.49 MB

Python package for streamlined analysis of sports data.

Home Page: https://floodlight.readthedocs.io/en/latest/index.html

License: MIT License

Python 100.00%

python sports-analytics sports-stats

floodlight's Introduction

floodlight

A high-level, data-driven sports analytics framework

floodlight is a Python package for streamlined analysis of sports data. It is designed with a clear focus on scientific computing and built upon popular libraries such as numpy or pandas.

Load, integrate, and process tracking and event data, codes and other match-related information from major data providers. This package provides a set of standardized data objects to structure and handle sports data, together with a suite of common processing operations such as transforms or data manipulation methods.

All implementations run completely provider- and sports-independent, while maintaining a maximum of flexibility to incorporate as many data flavours as possible. A high-level interface allows easy access to all standard routines, so that you can stop worrying about data wrangling and start focussing on the analysis instead!

Quick Demo
Features
Installation
Documentation
How to contribute

Quick Demo

floodlight simplifies sports data loading, processing and advanced performance analyses. Check out the example below, where querying a public data sample, filtering the data and computing the expended metabolic work of the active home team players is done in a few lines of code:

>>> from floodlight.io.datasets import EIGDDataset
>>> from floodlight.transforms.filter import butterworth_lowpass
>>> from floodlight.models.kinetics import MetabolicPowerModel

>>> dataset = EIGDDataset()
>>> home_team_data, away_team_data, ball_data = dataset.get()

>>> home_team_data = butterworth_lowpass(home_team_data)

>>> model = MetabolicPowerModel()
>>> model.fit(home_team_data)
>>> metabolic_power = model.cumulative_metabolic_power()

>>> print(metabolic_power[-1, 0:7])

[1669.18781115 1536.22481121 1461.03243489 1488.61249785  773.09264071
 1645.01702421  746.94057676]

To find out more, see the full set of features below or get started quickly with one of our many tutorials from the official documentation!

Features

We provide core data structures for team sports data, parsing functionality for major data providers, access points to public data sets, data filtering, plotting routines and many computational models from the literature. The feature set is constantly expanding, and if you want to add more just open an issue!

Data-level Objects

Tracking data
Event data
Pitch information
Teamsheets with player information (new)
Codes such as ball possession information
Properties such as distances or advanced computations

Parser

Tracab/ChyronHego: Tracking data, Teamsheets, Codes
DFL/STS: Tracking data, Event data, Teamsheets, Codes
Kinexon: Tracking data
Opta: Event data (F24 feeds)
Second Spectrum: Tracking data, Event data (new)
Sportradar: Event data (new)
StatsPerform: Tracking data, Event data (with URL access)
StatsBomb: Event data

Datasets

EIGD-H (Handball tracking data)
StatsBomb OpenData (Football event data)

Manipulation and Plotting

Spatial transformations for all data structures
Lowpass-filter tracking data
Slicing, selection and sequencing methods
Plot pitches, player positions and model overlays

Models and Metrics

Approximate Entropy
Centroids
Distances, Velocities & Accelerations
Metabolic Power & Equivalent Distances
Voronoi Space Control (new)

Installation

The package can be installed easily via pip:

pip install floodlight

Documentation

You can find all documentation here.

Contributing

Check out Contributing.md for a quick rundown of what you need to know to get started. We also provide an extended, beginner-friendly guide on how to start contributing in our documentation.

Citing

If you've used floodlight in your scientific work, please cite the corresponding paper.

@article{Raabe2022,
    doi = {10.21105/joss.04588},
    url = {https://doi.org/10.21105/joss.04588},
    year = {2022},
    publisher = {The Open Journal},
    volume = {7},
    number = {76},
    pages = {4588},
    author = {Dominik Raabe and Henrik Biermann and Manuel Bassek and Martin Wohlan and Rumena Komitova
              and Robert Rein and Tobias Kuppens Groot and Daniel Memmert},
    title = {floodlight - A high-level, data-driven sports analytics framework},
    journal = {Journal of Open Source Software}
}

Why

Why do we need another package that introduces its own data structures and ways of dealing with certain problems? And what's the purpose of trying to integrate all different data sources and fit them into a single framework? Especially since there already exist packages that aim to solve certain parts of that pipeline?

Our answer is - although we love those packages out there - that we did not find a solution that did fit our needs. Available packages are either tightly connected to a certain data format/provider, adapt to the subtleties of a particular sport, or solve one particular problem. This still left us with the essential problem of adapting to different interfaces.

We felt that as long as there is no underlying, high-level framework, each and every use case again and again needs its own implementation. At last, we found ourselves refactoring the same code - and there are certain data processing or plotting routines that are required in almost every project - over and over again just to fit the particular data structures we're dealing with at that time.

About

This project has been kindly supported by the Institute of Exercise Training and Sport Informatics at the German Sport University Cologne under supervision of Prof. Daniel Memmert.

Related Projects

floodlight's People

Contributors

Stargazers

Watchers

Forkers

martinwohlan manuba95 tkgroot hbiermann95 rkomitova robertreingit justus-git sandralexplore alek050 alexbanning mad4ms purbon 8funtik8 vellaro

floodlight's Issues

JOSS review: Comment on low test coverage for datasets.py?

Only 20% of the lines in this file seem to be covered by a test. Can the authors comment on this? There is likely a reason for this and I don't want to force the authors to write pedantic test cases. So, an explanation would suffice.

openjournals/joss-reviews#4588

First Release - First Issue! Whats next...

Hello Everybody!

It's time to celebrate a new year and, a little belated, our very first release! 🎉 🎆

A big 🎈THANK YOU🎈 to all contributors who made this first fully functional release possible in just two months! I'm incredibly excited to see this package finally being released. It's been four years of a seemingly endless conceptualize-implement-refactor cycle for me, and I'm more than grateful for all your help in realizing this project!

At this point, we've got all the essentials ready, so the next steps will be to gradually add features and functionality to our package. With this first issue, I just want to give a quick overview of the topics (and few bugs) we've come across preparing the first release, and a comprehensive overview over the potential next steps.

Please note that I'm kind of squashing everything in to a single issue which is not the typical way to go. But instead of opening a hundred single issues, I'd like to keep this one for now as a primary feed for discussing the next release. It's really just for convenience and acknowledging the rapid development phase we are in at this stage. I will also include a more detailed long-term vision and project outline soon.

In this sense, this is not the right thread to collect long-term feature ideas (which would probably explode at this point), but rather a collection of useful functionality that can be included right away on top of the current release.

However, feature requests are highly welcome, please just open a separate issue! Same goes for bug fixes or documentation additions/correcionts - we have issue templates ready for all of these.

It's still a few basic processing routines we need to include before we can start jumping into really fancy analytics, but I'm optimistic we can include the first (semantic) processing routines soon! If you are interested in implementing/fixing one of the points, just leave a comment!

Fixes

A list of known bugs and errors that should be fixed:

'tracab' pitch template is in m, but data is in cm
codes returned by DFL parser should be np.arrays, not lists
event 'minutes' and 'seconds' columns returned by Opta parser should be relative to segments
update python version in pyproject.toml to match latest version on runner

Improvements

A list of things that should be improved for the existing code:

#70
Include missing tests for
- Events.column_values_in_range
- all Events' properties

Potential Features for Upcoming Release

A list of features that are of interest for the next upcoming release.

Core-Module

XY

def permute(self, i: int, j: int): Permute data (i.e. switch columns) of player with xIDs i and j.
def estimate_playing_direction(self, pitch): Estimate XY.direction property from data and pitch.
merge columns
missing data checks

Events

Spatial transforms mimicking the ones for XY objects:
- def translate(self, shift: Tuple[Numeric, Numeric])
- def scale(self, factor: float, axis: int = None)
- def reflect(self, axis: int)
- def rotate(self, alpha: float)
Slicing method mimicking the one of theXY class, although based on gameclock or frameclock column.
def estimate_playing_direction(self, pitch): Estimate XY.direction property from data and pitch.
def get_event_stream(self, fade: int=0) -> Code: Generate a Code (continuous) object from a list of events, maybe makes more sense to make this a property.

Code

Implement wrapper/dunders for numpy's logical indexing, i.e. to enable code[code == 'A']
Find sequences, i.e., compute a list of start- and end-frames for sequences of consecutive frames with the same token:
['A', 'A', 'H', 'H', 'H', 'H', 'A', 'A', 'A'] -> [(0, 2, 'A'), (2, 6, 'H'), (6, 9, 'A')]

Pitch

Property methods for sport-specific coordinates of pitch markings or goal positions

New classes

Add core classes and basic functionality:
- Property
- Teamsheet

Plotting

Basic plot(...) methods for major core objects. Plots on a given axes or creates one if not supplied. An initial kick-off could incorporate:
- pitch: Plot a (background) playing surface for a given sport @MartinWohlan
- xy: Plot position data for a time frame, or trajectories for a time period @MartinWohlan
- events: Plot event data (locations) for a time period or event type (summary view)
- code: Plot temporal summary

IO- Module

New parser

SecondSpectrum Position Data
Statsperform general format
Opta f7/f9 feeds

Utils-Module

Refactor and test def get_and_convert from Opta parser as utility function

New Modules

New modules to be added soon and attached functionality:

Documentation

Possible upgrades to our docs for the next release:

Contributing Manual

Extend the Contributing Manual for the following subchapters:
- How to do a Pull Request
- How to submit an Issue
- How to document your Code with numpy docstrings and Sphinx

Tutorials

Add basic tutorials (exceeding the getting started chapter) for data handling using the package

Compendium

Add examples to the existing chapters for clarity:
- Handling different Clocks
- Which data / raw data files goes into which core objects

Last but not least: I really hope our joint interest in sports data can help us build a small welcoming, constructive and inspiring community around this project! So with opening the Issue section, I'd like to encourage everyone to use Issues and Pull Requests for contributions as well as general communication! 🤙

Joss review: Need more documentation on poetry. Also, deprecate pip install instructions

I see that the developers have used poetry as the means of inviting development. However, I think there need to be step by step instructions on how to use poetry for contribution. Forgive my ignorance, but I could not follow the current instructions to get a version of this package set up on my machine. I would recommend that the developers add the setup.py file in addition to poetry for this reason.

openjournals/joss-reviews#4588

[BUG] `Poetry install` commend raises error for `h5py`

Checklist

I've updated to the latest version of floodlight
I've checked if a similar issue exists

Describe the bug
when cloning the floodlight code locally and running poetry install, the installation of packages fails (see screenshots below).

To Reproduce
Open a new project, clone the floodlight code, run poetry install with poetry version 1.3.2 and python version 3.10.10

Expected behavior
A CalledProcessError is raised when running poetry install. h5py can not be installed. See screenshots below for terminal output.

Screenshots

Platform (please complete the following information):

OS: MacOs (Ventura 13.1)
Python Version 3.10.10 (see screenshots)

Additional context
Possible solution: After running poetry update, updating the package dependencies, everything works fine. This would also make it possible to run the package using Python 3.11 (latest stable version).

JOSS review: Considering switching documentation to another branch . Something like gh-pages

I see the docs are hosted on each branch. This would create problems for future development. E.g every time you wanted to update the documentation, you would have to re-merge into master etc. I would recommend you store the docs on a separate branch. That way when you just need to update the documentation, you just do it on that branch and don't touch master at all. You can also link the new branch for documentation in your settings menu. See example below,

openjournals/joss-reviews#4588

[DOCS] Tracab Data Documentation

Hi!

The tracbab functions introduce some variables without proper documentation, as listed below:

Ball and player speed without mentioning their units and how they are calculated from x,y, and z values.
The meaning of the system_id variable.
SetHome, SetAway, and other values that are ignored!
There are other values than 0 and 1 for team, which are not explained.

Can you clarify them?

JOSS review: Descriptions, metadata for datasets

Could you add a few words of metadata about the StatsBomb, EIGD dataset? Are there specific versions of these datasets that floodlight is compatible with? If something changes in these data will it affect functionality in floodlight? Also, are these the only datasets that floodlight will work with? If so would it be useful to add a data processing function that will process input data from any source for floodlight?

openjournals/joss-reviews#4588

JOSS review: Recommend adding coverage to toml file

I added this to check test coverage. I would recommend adding this since it will be useful as package gets more users, contributors.

openjournals/joss-reviews#4588

JOSS review: No setup.py?

This project does not have a setup.py. Effectively developers are suggesting intsllation via pip only. Does this mean no public user can contribute to the code? Since this is an open source software package, I think this should include a setup.py so that users have the option to have a developer version of the code where they can make changes. I would insist on this since this is a requirement for an open source software package. Corresponding installtion instructions would also have to be added.

openjournals/joss-reviews#4588

[FEAT] Update StatsBomb Parser

Checklist

I've updated to the latest version of floodlight
I've checked if a similar issue exists

Issues in old version

StatsBomb parser splits the events by teams to meet with floodlight standards for data classes. For doing so it uses the tID of the team currently in possession. Therefore, defensive actions like tackles are also added to the event class of the offensive team. Since Statsbomb provides a tID, using this value seems more sensible to me.
The Parser requires a locally stored file to include the match_id. While this seems to be a good solution when accessing data through the Statsbomb API interface (with automatic FileIO) it might lead to problems with local files. This should be re-evaluated and at least documented more clearly.

Suggestions for new version
There has been a python package released by the official StatsBomb account, that reads both the open data as well as protected data from the API. Yet, the returned objects are comparably less concise than floodlight Events objects. Thus, it could make sense to integrate the functionality of that package to the StatsBomb parser pipeline.

https://github.com/statsbomb/statsbombpy

JOSS review: Add explicit link to documentation here (See attached image)

The authors have put together an impressive set of documentation. Why not link this in the About section? I think users would look there for the documentation anyway. This should be a simple fix.

openjournals/joss-reviews#4588

JOSS review: What is the license associated with this? Considering adding a BSD2 clause license file indicating this is open source

I could not find the license file. Please add it to the repository. I would recommend a BSD2 license. I can provide an example of this if required.

openjournals/joss-reviews#4588

[FEAT] Add a pressure model

Checklist

I believe the feature fits the scope of the project

Is your feature request related to a problem? Please describe.
There is not really a problem, the package would just be more complete when a pressure model is added.

Describe the solution you'd like
I want a pressure model to be added to the package. In 2016, Adrienko et al. published a paper with a pressure model for soccer specific purposes. The model makes an estimate of the pressure on a player based on the location of the defenders relative to the attacker and the distance between the defenders and the player. The parameters described in the paper are optimized for field soccer situations, but can of course be addapted to fit other field sports.
Later Herold et al (2022) updated the model of Adrienko et al since they argued that the pressure parameters should change based on the location on the pitch: pressure is location dependent. A new model could be added to the floodligth package that calculates the pressure on a specific player during a specified time period based on tracking data.

sources:

Andrienko, G., Andrienko, N., Budziak, G., Dykes, J., Fuchs, G., Von
Landesberger, T. and Weber, H. (2017). Visual Analysis of Pressure in Football. Data Mining
and Knowledge Discovery, 31(6), pp. 1793-1839. doi: 10.1007/s10618-017-0513-2
Mat Herold, A. Hecksteden, D. Radke, F. Goes, S. Nopp, T. Meyer & M.
Kempe (2022): Off-ball behavior in association football: A data-driven model to measure changes in
individual defensive pressure, Journal of Sports Sciences, DOI: 10.1080/02640414.2022.2081405

Describe alternatives you've considered
A alternative could be to calculcate the pressure of all players of both teams for the whole match. But I would argue this is not really computational efficient since pressure is mainly interesting during on ball events (take ons or 1-vs-1 actions). Calculating pressure for all players and teams would probably take to long and without any added value. However, if we can make the code computational efficient and fast enough it might make it easier to use since some of the other models also calculate everything for all players during the whole match (Kinematics model) if I'm not mistaken.

Additional context

JOSS review: Some inconsistency with pip version and github version, documented version of floodlight

datasets.py does not show up in the pip installed version of this package. See screenshot below , while something called sampledata shows up. Is it that the pip version and github version are different somehow? Addition of the setup.py in the github, would have solved this. Please let me know if I'm missing something.

openjournals/joss-reviews#4588

[Feature request]: summary() or info() function for teamsheets

For example:

(
xy_objects,
possession_objects,
ballstatus_objects,
teamsheets,
pitch,
) = dfl.read_position_data_xml(fpath, fpath_info)
teamsheets['Home'].summary()

Providing a short descriptive summary of the game.

floodlight-sports / floodlight Goto Github PK

floodlight's Introduction

floodlight

A high-level, data-driven sports analytics framework

Quick Demo

Features

Data-level Objects

Parser

Datasets

Manipulation and Plotting

Models and Metrics

Installation

Documentation

Contributing

Citing

Why

About

Related Projects

floodlight's People

Contributors

Stargazers

Watchers

Forkers

floodlight's Issues

Fixes

Improvements

Potential Features for Upcoming Release

Core-Module

XY

Events

Code

Pitch

New classes

Plotting

IO- Module

New parser

Utils-Module

New Modules

Documentation

Contributing Manual

Tutorials

Compendium

Additional context

Recommend Projects

Recommend Topics

Recommend Org