Giter VIP home page Giter VIP logo

floodlight's Introduction

floodlight

Latest Version Python Version Documentation Status Build Status Linting Status Codecov DOI

A high-level, data-driven sports analytics framework

floodlight is a Python package for streamlined analysis of sports data. It is designed with a clear focus on scientific computing and built upon popular libraries such as numpy or pandas.

Load, integrate, and process tracking and event data, codes and other match-related information from major data providers. This package provides a set of standardized data objects to structure and handle sports data, together with a suite of common processing operations such as transforms or data manipulation methods.

All implementations run completely provider- and sports-independent, while maintaining a maximum of flexibility to incorporate as many data flavours as possible. A high-level interface allows easy access to all standard routines, so that you can stop worrying about data wrangling and start focussing on the analysis instead!



Quick Demo

floodlight simplifies sports data loading, processing and advanced performance analyses. Check out the example below, where querying a public data sample, filtering the data and computing the expended metabolic work of the active home team players is done in a few lines of code:

>>> from floodlight.io.datasets import EIGDDataset
>>> from floodlight.transforms.filter import butterworth_lowpass
>>> from floodlight.models.kinetics import MetabolicPowerModel

>>> dataset = EIGDDataset()
>>> home_team_data, away_team_data, ball_data = dataset.get()

>>> home_team_data = butterworth_lowpass(home_team_data)

>>> model = MetabolicPowerModel()
>>> model.fit(home_team_data)
>>> metabolic_power = model.cumulative_metabolic_power()

>>> print(metabolic_power[-1, 0:7])

[1669.18781115 1536.22481121 1461.03243489 1488.61249785  773.09264071
 1645.01702421  746.94057676]

To find out more, see the full set of features below or get started quickly with one of our many tutorials from the official documentation!

Features

We provide core data structures for team sports data, parsing functionality for major data providers, access points to public data sets, data filtering, plotting routines and many computational models from the literature. The feature set is constantly expanding, and if you want to add more just open an issue!

Data-level Objects

  • Tracking data
  • Event data
  • Pitch information
  • Teamsheets with player information (new)
  • Codes such as ball possession information
  • Properties such as distances or advanced computations

Parser

  • Tracab/ChyronHego: Tracking data, Teamsheets, Codes
  • DFL/STS: Tracking data, Event data, Teamsheets, Codes
  • Kinexon: Tracking data
  • Opta: Event data (F24 feeds)
  • Second Spectrum: Tracking data, Event data (new)
  • Sportradar: Event data (new)
  • StatsPerform: Tracking data, Event data (with URL access)
  • StatsBomb: Event data

Datasets

  • EIGD-H (Handball tracking data)
  • StatsBomb OpenData (Football event data)

Manipulation and Plotting

  • Spatial transformations for all data structures
  • Lowpass-filter tracking data
  • Slicing, selection and sequencing methods
  • Plot pitches, player positions and model overlays

Models and Metrics

  • Approximate Entropy
  • Centroids
  • Distances, Velocities & Accelerations
  • Metabolic Power & Equivalent Distances
  • Voronoi Space Control (new)

Installation

The package can be installed easily via pip:

pip install floodlight

Documentation

You can find all documentation here.

Contributing

Contributions Code style: black

Check out Contributing.md for a quick rundown of what you need to know to get started. We also provide an extended, beginner-friendly guide on how to start contributing in our documentation.

Citing

If you've used floodlight in your scientific work, please cite the corresponding paper.

@article{Raabe2022,
    doi = {10.21105/joss.04588},
    url = {https://doi.org/10.21105/joss.04588},
    year = {2022},
    publisher = {The Open Journal},
    volume = {7},
    number = {76},
    pages = {4588},
    author = {Dominik Raabe and Henrik Biermann and Manuel Bassek and Martin Wohlan and Rumena Komitova
              and Robert Rein and Tobias Kuppens Groot and Daniel Memmert},
    title = {floodlight - A high-level, data-driven sports analytics framework},
    journal = {Journal of Open Source Software}
}

Why

Why do we need another package that introduces its own data structures and ways of dealing with certain problems? And what's the purpose of trying to integrate all different data sources and fit them into a single framework? Especially since there already exist packages that aim to solve certain parts of that pipeline?

Our answer is - although we love those packages out there - that we did not find a solution that did fit our needs. Available packages are either tightly connected to a certain data format/provider, adapt to the subtleties of a particular sport, or solve one particular problem. This still left us with the essential problem of adapting to different interfaces.

We felt that as long as there is no underlying, high-level framework, each and every use case again and again needs its own implementation. At last, we found ourselves refactoring the same code - and there are certain data processing or plotting routines that are required in almost every project - over and over again just to fit the particular data structures we're dealing with at that time.

About

This project has been kindly supported by the Institute of Exercise Training and Sport Informatics at the German Sport University Cologne under supervision of Prof. Daniel Memmert.

Related Projects

floodlight's People

Contributors

draabe avatar hbiermann95 avatar manuba95 avatar martinwohlan avatar rkomitova avatar robertreingit avatar tkgroot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

floodlight's Issues

First Release - First Issue! Whats next...

Hello Everybody!

It's time to celebrate a new year and, a little belated, our very first release! ๐ŸŽ‰ ๐ŸŽ†

A big ๐ŸŽˆTHANK YOU๐ŸŽˆ to all contributors who made this first fully functional release possible in just two months! I'm incredibly excited to see this package finally being released. It's been four years of a seemingly endless conceptualize-implement-refactor cycle for me, and I'm more than grateful for all your help in realizing this project!

At this point, we've got all the essentials ready, so the next steps will be to gradually add features and functionality to our package. With this first issue, I just want to give a quick overview of the topics (and few bugs) we've come across preparing the first release, and a comprehensive overview over the potential next steps.

Please note that I'm kind of squashing everything in to a single issue which is not the typical way to go. But instead of opening a hundred single issues, I'd like to keep this one for now as a primary feed for discussing the next release. It's really just for convenience and acknowledging the rapid development phase we are in at this stage. I will also include a more detailed long-term vision and project outline soon.

In this sense, this is not the right thread to collect long-term feature ideas (which would probably explode at this point), but rather a collection of useful functionality that can be included right away on top of the current release.

However, feature requests are highly welcome, please just open a separate issue! Same goes for bug fixes or documentation additions/correcionts - we have issue templates ready for all of these.

It's still a few basic processing routines we need to include before we can start jumping into really fancy analytics, but I'm optimistic we can include the first (semantic) processing routines soon! If you are interested in implementing/fixing one of the points, just leave a comment!

Fixes

A list of known bugs and errors that should be fixed:

  • 'tracab' pitch template is in m, but data is in cm
  • codes returned by DFL parser should be np.arrays, not lists
  • event 'minutes' and 'seconds' columns returned by Opta parser should be relative to segments
  • update python version in pyproject.toml to match latest version on runner

Improvements

A list of things that should be improved for the existing code:

  • #70
  • Include missing tests for
    • Events.column_values_in_range
    • all Events' properties

Potential Features for Upcoming Release

A list of features that are of interest for the next upcoming release.

Core-Module

XY

  • def permute(self, i: int, j: int): Permute data (i.e. switch columns) of player with xIDs i and j.
  • def estimate_playing_direction(self, pitch): Estimate XY.direction property from data and pitch.
  • merge columns
  • missing data checks

Events

  • Spatial transforms mimicking the ones for XY objects:
    • def translate(self, shift: Tuple[Numeric, Numeric])
    • def scale(self, factor: float, axis: int = None)
    • def reflect(self, axis: int)
    • def rotate(self, alpha: float)
  • Slicing method mimicking the one of theXY class, although based on gameclock or frameclock column.
  • def estimate_playing_direction(self, pitch): Estimate XY.direction property from data and pitch.
  • def get_event_stream(self, fade: int=0) -> Code: Generate a Code (continuous) object from a list of events, maybe makes more sense to make this a property.

Code

  • Implement wrapper/dunders for numpy's logical indexing, i.e. to enable code[code == 'A']
  • Find sequences, i.e., compute a list of start- and end-frames for sequences of consecutive frames with the same token:
    ['A', 'A', 'H', 'H', 'H', 'H', 'A', 'A', 'A'] -> [(0, 2, 'A'), (2, 6, 'H'), (6, 9, 'A')]

Pitch

  • Property methods for sport-specific coordinates of pitch markings or goal positions

New classes

  • Add core classes and basic functionality:
    • Property
    • Teamsheet

Plotting

  • Basic plot(...) methods for major core objects. Plots on a given axes or creates one if not supplied. An initial kick-off could incorporate:
    • pitch: Plot a (background) playing surface for a given sport @MartinWohlan
    • xy: Plot position data for a time frame, or trajectories for a time period @MartinWohlan
    • events: Plot event data (locations) for a time period or event type (summary view)
    • code: Plot temporal summary

IO- Module

New parser

  • SecondSpectrum Position Data
  • Statsperform general format
  • Opta f7/f9 feeds

Utils-Module

  • Refactor and test def get_and_convert from Opta parser as utility function

New Modules

New modules to be added soon and attached functionality:

  • floodlight.transforms: This module includes (mathematical) transformations performed on more than one object. These include spatial and temporal transformations and generally consists of functions that perform operations of the form
    Tuple[data-level core objects] -> Tuple[data-level core objects].

    • Temporal transformations
      • Downsampling: Downsample framerate-bound core objects. This may involve data interpolation and is rather non-trivial to implement.
      • Upsampling: Upsample framerate-bound core objects. This may involve data imputation and is rather non-trivial to implement.
      • Synchronization: Given multiple objects, re-sample them cleverly to the same framerate.
    • Spatial transformations
      • Extend core object methods such as scaling, rotating, reflecting to the multi-object case that [ ] also adjusts the coordinate system coded in the Pitch object.
      • Synchronization: Given multiple objects, transform them cleverly onto the same pitch.
  • floodlight.models: This module is intended to collect algorithms that process sports data and can be roughly described as model building on a frame-by-frame level (in contrast, e.g., to aggregated performance metric calculation). These operations can roughly be described as functions that perform operations of the form
    Tuple[data-level core objects] -> Union[Code-objects, Property-Objects].

    An example would be one of the many space-control models that calculates, for each frame and player, the amount of space he or she controls. There are countless models we could implement, but for starters I would suggest the following three:

    • Physiological Models: May include basic kinematics such as speed, acceleration, distance covered, number of sprints, speed bands, load metrics, ... . @manuba95
    • "Basic" Geometric Models: These include rather straightforward computations regarding player-to-player or player-to-ball distances, convex hull, centroids, stretch, ... .
    • Space Control Models: These models can be mostly found in football. Of the many available, I would start with the most basic Voronoi model before anything else.

Documentation

Possible upgrades to our docs for the next release:

Contributing Manual

  • Extend the Contributing Manual for the following subchapters:
    • How to do a Pull Request
    • How to submit an Issue
    • How to document your Code with numpy docstrings and Sphinx

Tutorials

  • Add basic tutorials (exceeding the getting started chapter) for data handling using the package

Compendium

  • Add examples to the existing chapters for clarity:
    • Handling different Clocks
    • Which data / raw data files goes into which core objects

Last but not least: I really hope our joint interest in sports data can help us build a small welcoming, constructive and inspiring community around this project! So with opening the Issue section, I'd like to encourage everyone to use Issues and Pull Requests for contributions as well as general communication! ๐Ÿค™

Joss review: Need more documentation on poetry. Also, deprecate pip install instructions

I see that the developers have used poetry as the means of inviting development. However, I think there need to be step by step instructions on how to use poetry for contribution. Forgive my ignorance, but I could not follow the current instructions to get a version of this package set up on my machine. I would recommend that the developers add the setup.py file in addition to poetry for this reason.

openjournals/joss-reviews#4588

[BUG] `Poetry install` commend raises error for `h5py`

Checklist

  • I've updated to the latest version of floodlight
  • I've checked if a similar issue exists

Describe the bug
when cloning the floodlight code locally and running poetry install, the installation of packages fails (see screenshots below).

To Reproduce
Open a new project, clone the floodlight code, run poetry install with poetry version 1.3.2 and python version 3.10.10

Expected behavior
A CalledProcessError is raised when running poetry install. h5py can not be installed. See screenshots below for terminal output.

Screenshots
Screenshot 2023-02-17 at 15 31 10
Screenshot 2023-02-17 at 15 31 34
Screenshot 2023-02-17 at 15 31 59
Screenshot 2023-02-17 at 15 32 32
Screenshot 2023-02-17 at 15 32 52

Platform (please complete the following information):

  • OS: MacOs (Ventura 13.1)
  • Python Version 3.10.10 (see screenshots)

Additional context
Possible solution: After running poetry update, updating the package dependencies, everything works fine. This would also make it possible to run the package using Python 3.11 (latest stable version).

JOSS review: Considering switching documentation to another branch . Something like gh-pages

I see the docs are hosted on each branch. This would create problems for future development. E.g every time you wanted to update the documentation, you would have to re-merge into master etc. I would recommend you store the docs on a separate branch. That way when you just need to update the documentation, you just do it on that branch and don't touch master at all. You can also link the new branch for documentation in your settings menu. See example below,

image

openjournals/joss-reviews#4588

[DOCS] Tracab Data Documentation

Hi!

The tracbab functions introduce some variables without proper documentation, as listed below:

  • Ball and player speed without mentioning their units and how they are calculated from x,y, and z values.
  • The meaning of the system_id variable.
  • SetHome, SetAway, and other values that are ignored!
  • There are other values than 0 and 1 for team, which are not explained.

Can you clarify them?

JOSS review: Descriptions, metadata for datasets

Could you add a few words of metadata about the StatsBomb, EIGD dataset? Are there specific versions of these datasets that floodlight is compatible with? If something changes in these data will it affect functionality in floodlight? Also, are these the only datasets that floodlight will work with? If so would it be useful to add a data processing function that will process input data from any source for floodlight?

openjournals/joss-reviews#4588

JOSS review: No setup.py?

This project does not have a setup.py. Effectively developers are suggesting intsllation via pip only. Does this mean no public user can contribute to the code? Since this is an open source software package, I think this should include a setup.py so that users have the option to have a developer version of the code where they can make changes. I would insist on this since this is a requirement for an open source software package. Corresponding installtion instructions would also have to be added.

openjournals/joss-reviews#4588

[FEAT] Update StatsBomb Parser

Checklist

  • I've updated to the latest version of floodlight
  • I've checked if a similar issue exists

Issues in old version

  • StatsBomb parser splits the events by teams to meet with floodlight standards for data classes. For doing so it uses the tID of the team currently in possession. Therefore, defensive actions like tackles are also added to the event class of the offensive team. Since Statsbomb provides a tID, using this value seems more sensible to me.

  • The Parser requires a locally stored file to include the match_id. While this seems to be a good solution when accessing data through the Statsbomb API interface (with automatic FileIO) it might lead to problems with local files. This should be re-evaluated and at least documented more clearly.

Suggestions for new version
There has been a python package released by the official StatsBomb account, that reads both the open data as well as protected data from the API. Yet, the returned objects are comparably less concise than floodlight Events objects. Thus, it could make sense to integrate the functionality of that package to the StatsBomb parser pipeline.

https://github.com/statsbomb/statsbombpy

[FEAT] Add a pressure model

Checklist

  • I believe the feature fits the scope of the project

Is your feature request related to a problem? Please describe.
There is not really a problem, the package would just be more complete when a pressure model is added.

Describe the solution you'd like
I want a pressure model to be added to the package. In 2016, Adrienko et al. published a paper with a pressure model for soccer specific purposes. The model makes an estimate of the pressure on a player based on the location of the defenders relative to the attacker and the distance between the defenders and the player. The parameters described in the paper are optimized for field soccer situations, but can of course be addapted to fit other field sports.
Later Herold et al (2022) updated the model of Adrienko et al since they argued that the pressure parameters should change based on the location on the pitch: pressure is location dependent. A new model could be added to the floodligth package that calculates the pressure on a specific player during a specified time period based on tracking data.

sources:

  • Andrienko, G., Andrienko, N., Budziak, G., Dykes, J., Fuchs, G., Von
    Landesberger, T. and Weber, H. (2017). Visual Analysis of Pressure in Football. Data Mining
    and Knowledge Discovery, 31(6), pp. 1793-1839. doi: 10.1007/s10618-017-0513-2
  • Mat Herold, A. Hecksteden, D. Radke, F. Goes, S. Nopp, T. Meyer & M.
    Kempe (2022): Off-ball behavior in association football: A data-driven model to measure changes in
    individual defensive pressure, Journal of Sports Sciences, DOI: 10.1080/02640414.2022.2081405

Describe alternatives you've considered
A alternative could be to calculcate the pressure of all players of both teams for the whole match. But I would argue this is not really computational efficient since pressure is mainly interesting during on ball events (take ons or 1-vs-1 actions). Calculating pressure for all players and teams would probably take to long and without any added value. However, if we can make the code computational efficient and fast enough it might make it easier to use since some of the other models also calculate everything for all players during the whole match (Kinematics model) if I'm not mistaken.

Additional context

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.