AWS lambda function for calculating FPL data statistics

The purpose of this project is provide to an AWS lambda function that:

retrieves data from the FPL API
calculates various statistics, including expected points for each game week, using the prep-data.ipynb Jupyter notebook
makes the prepared data sets available for data analysers such as the FPL Advisor. The data sets are published in the public fpl.177arc.net S3 bucket

The lambda function runs in AWS on an hourly schedule during the day and continously updates the data.

Important data points

The following data points are worth highlighting:

Expected Points Next GW in player_gw_next_eps_ext.csv: Points that each player is expected to earn in the upcoming game week.
Expected Points in players_gw_team_eps_ext.csv: Points that the player is expected to earn for each game week.
Expected Goals For in players_gw_team_eps_ext.csv: Goals that the team of the player is expected to score for each game week.
Expected Goals Against in players_gw_team_eps_ext.csv: Goals that the team of the player is expected to conceded for each game week.

Expected points calculation methodology

The fundamental idea is that the best evidence for a player's ability to generate points is to look over a sliding window of past fixtures while taking into account the difficulty of the opposing team.

The expected points for each game week for each player are calculated by taking the average points earned by
each player for every event type (e.g. goals scored, goals conceded, clean sheets, yellow cards, etc.) separately over a sliding window of past fixtures (currently 12). These averages are adjusted based on the relative strength of the opposing team compared to the relative strength of the opposing teams that the player has played so far.

Features

Team estimated strength and expected goals based on rolling window of fixtures of this and the last season
Player estimated points for each game week based on opposing team strength and points earned over short rolling window of past fixtures
Where past data is not available, estimates for past data are patched either using manually curated data or, if not available, using machine learning
Data completeness indicator for each player and game week combination
Estimated points for each past player and game week combination to support back testing
Correctly handles game weeks when team does not play as well as double game weeks

For more details, see prep-data.ipynb Jupyter notebook.

Limitations

Although player availability data is used, the textual news information is not interpreted to project a return date as part of the longer-term expected points calculation. In short, player availibiltiy is reliable for the upcoming game week but not thereafter.
Only data points from the FPL API are used but no alternative data is incorporated.

List of data sets and data dictionaries

player_gw_next_eps_ext.csv (~120,000 data points, data dictionary): Contains a row for each player in the current season with expected points for the next game week up to the last one. The data is indexed by the player code which is unique across season.
players_gw_team_eps_ext.csv (~7,000,000 data points, data dictionary): Contains a row for each player and game week combination for the current and last season with the expected points for past and upcoming game weeks. The data is indexed by the player code, the season and the game week number.
team_fixture_stats_ext.csv (~100,000 data points, data dictionary): Contains a row for each fixture with the corresponing team info. It has stats for each fixture that are possible indicators of the outcome. These stats are eventually used in the calculation of the expected points. The data is index by the fixture code that is unique across different seasons.
players_history_ext.csv (~70,000 data points, data dictionary): Contains a row for each player fixture combination for the current and the last season with most attributes published by this FPL API endpoint: . The data is index by the player code and the fixture code, both of them are unique across seasons.
fixtures_ext.csv (~12,000 data points, data dictionary): Contains a row for each fixture in the current and the last season with most attributes published by this FPL API endpoint: . The data is indexed by the fixture code that is unique across different seasons.
player_teams.csv (~36,000 data points, data dictionary): Contains a row for each player in the current season with the corresponding team info. The data is index by the player code that is unique across seasons.
teams.csv (120 data points, data dictionary): Contains a row for each team playing in the current season with most attributes published by this FPL API endpoint: . The data is indexed by the team code that is unique across different seasons.
players_ext.csv (~42,000 data points, data dictionary): Contains a row for each player in the current and last season with most of the attributes published by this FPL API endpoint: . The data is indexed by the player code that is unique across seasons.
gws.csv (646 data points, data dictionary): Contains a row for each game week of the current season wth most of the game week attributes published by this FPL API endpoint: . The data is indexed by the game week ID.

177arc / fpl-data Goto Github PK

fpl-data's Introduction

AWS lambda function for calculating FPL data statistics

Important data points

Expected points calculation methodology

Features

Limitations

List of data sets and data dictionaries

fpl-data's People

Contributors

Stargazers

Watchers

Forkers

fpl-data's Issues

Recommend Projects

Recommend Topics

Recommend Org