Giter VIP home page Giter VIP logo

fpl-data's Introduction

Python 3.8

AWS lambda function for calculating FPL data statistics

The purpose of this project is provide to an AWS lambda function that:

  1. retrieves data from the FPL API
  2. calculates various statistics, including expected points for each game week, using the prep-data.ipynb Jupyter notebook
  3. makes the prepared data sets available for data analysers such as the FPL Advisor. The data sets are published in the public fpl.177arc.net S3 bucket

The lambda function runs in AWS on an hourly schedule during the day and continously updates the data.

Important data points

The following data points are worth highlighting:

Expected points calculation methodology

The fundamental idea is that the best evidence for a player's ability to generate points is to look over a sliding window of past fixtures while taking into account the difficulty of the opposing team.

The expected points for each game week for each player are calculated by taking the average points earned by
each player for every event type (e.g. goals scored, goals conceded, clean sheets, yellow cards, etc.) separately over a sliding window of past fixtures (currently 12). These averages are adjusted based on the relative strength of the opposing team compared to the relative strength of the opposing teams that the player has played so far.

Features

  • Team estimated strength and expected goals based on rolling window of fixtures of this and the last season
  • Player estimated points for each game week based on opposing team strength and points earned over short rolling window of past fixtures
  • Where past data is not available, estimates for past data are patched either using manually curated data or, if not available, using machine learning
  • Data completeness indicator for each player and game week combination
  • Estimated points for each past player and game week combination to support back testing
  • Correctly handles game weeks when team does not play as well as double game weeks

For more details, see prep-data.ipynb Jupyter notebook.

Limitations

  • Although player availability data is used, the textual news information is not interpreted to project a return date as part of the longer-term expected points calculation. In short, player availibiltiy is reliable for the upcoming game week but not thereafter.
  • Only data points from the FPL API are used but no alternative data is incorporated.

List of data sets and data dictionaries

  • player_gw_next_eps_ext.csv (~120,000 data points, data dictionary): Contains a row for each player in the current season with expected points for the next game week up to the last one. The data is indexed by the player code which is unique across season.
  • players_gw_team_eps_ext.csv (~7,000,000 data points, data dictionary): Contains a row for each player and game week combination for the current and last season with the expected points for past and upcoming game weeks. The data is indexed by the player code, the season and the game week number.
  • team_fixture_stats_ext.csv (~100,000 data points, data dictionary): Contains a row for each fixture with the corresponing team info. It has stats for each fixture that are possible indicators of the outcome. These stats are eventually used in the calculation of the expected points. The data is index by the fixture code that is unique across different seasons.
  • players_history_ext.csv (~70,000 data points, data dictionary): Contains a row for each player fixture combination for the current and the last season with most attributes published by this FPL API endpoint: . The data is index by the player code and the fixture code, both of them are unique across seasons.
  • fixtures_ext.csv (~12,000 data points, data dictionary): Contains a row for each fixture in the current and the last season with most attributes published by this FPL API endpoint: . The data is indexed by the fixture code that is unique across different seasons.
  • player_teams.csv (~36,000 data points, data dictionary): Contains a row for each player in the current season with the corresponding team info. The data is index by the player code that is unique across seasons.
  • teams.csv (120 data points, data dictionary): Contains a row for each team playing in the current season with most attributes published by this FPL API endpoint: . The data is indexed by the team code that is unique across different seasons.
  • players_ext.csv (~42,000 data points, data dictionary): Contains a row for each player in the current and last season with most of the attributes published by this FPL API endpoint: . The data is indexed by the player code that is unique across seasons.
  • gws.csv (646 data points, data dictionary): Contains a row for each game week of the current season wth most of the game week attributes published by this FPL API endpoint: . The data is indexed by the game week ID.

fpl-data's People

Contributors

177arc avatar

Stargazers

 avatar Callistus Ndemo avatar Joe Morgan  avatar  avatar

Watchers

 avatar

Forkers

tb-sr

fpl-data's Issues

News columns contain nan as string

Currently, the 'News' and 'News and Date' columns contain the string nan instead of '' if the value is empty. This applies to the players_gw_team_eps_ext and player_gw_next_eps_ext data sets.

Disable calculation method test

Currently, the data is not generated when relative calculation method yields worse results than the simple method. We should disable this check since there can be valid circumstances for this to happen, especially at the beginning of a new season.

Add columns for non-cumulative future expected points

Currently the player_gw_next_eps_ext data set only includes cumulative future expected points columns, e.g. "Expected Points Next 8 GWs" which contains the expected points over the next eight game weeks. However, sometimes it would be useful to get the expected points for a specific future game week.

We should add on-cumulative expected points for all future game weeks.

Add data testing framework

Currently, the main mechanism for assuring the data is the built-in back testing. However, this is does not protected against many potential problems and is not a full replacement for regression testing.

Therefore, we need to create a framework for testing outputs based on fixed inputs and the ability to add sense check when validating real data.

Remaining game weeks do not include last game week

Currently, columns that contain projections for the next game weeks do not include the last game week. For example, if five game weeks are remaining, the data sets only include columns for Next 1 GWs to Next 4 GWs (instead of Next 5 GWs). We need to fix this.

Support fixture predictions

Currently, only definitively scheduled fixtures are available. To improve the usefulness of the data for the purpose of planning, we need to take the likelihood of a fixture happening in a particular game week into account. This data can come, for example, from Ben Crellin https://twitter.com/BenCrellin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.