alan-turing-institute / airsenal Goto Github PK
View Code? Open in Web Editor NEWMachine learning Fantasy Premier League team
License: MIT License
Machine learning Fantasy Premier League team
License: MIT License
At the moment, the 2 transfer strategies are investigated using a random process. @nbarlowATI reckons that the search space is sufficiently small that we can use brute force instead.
Cool potential feature would be to have a webpage where people can input their teams, and get recommendations for the next week's transfers.
First Stan model of attacking and defending (alphas and betas) per team.
Train using historical data.
This variable is intended to allow the optimizer to retrieve a consistent set of predicted points from the DB.
Currently a lot of places have a default value of this argument - some risk that we may not be using the predictions that we think we are in all places.
Suggested change is to have fill_predictedscore_table generate a UUID (or timestamp) to put into this field in the DB - same value for all rows in a given run of this script.
The fill_transfersuggestion_table will query the predicted_score table, and look at the last row, and then use this value for the optimization.
Remove all default values of this argument from function definitions, to flush out cases where we are calling a function without explicitly specifying it.
Use multiprocess pool to run iterations over different transfer strategies in parallel. Need to think about how to recombine at the end to find overall maximum.
We don't have a setup.py and so on.
See branch: fix/fpl_api_urls_2019
framework/data_fetcher.py
was giving me errors (empty data) as some of the FPL API URLs have changed. Plus trailing forward slash seems to be important, e.g. https://fantasy.premierleague.com/api/bootstrap-static/ returns data but https://fantasy.premierleague.com/api/bootstrap-static doesn't.
This set of URLs seems to work (where {}
are filled by python .format
calls, usually with a team, player or league id):
FPL_SUMMARY_API_URL = "https://fantasy.premierleague.com/api/bootstrap-static/"
FPL_DETAIL_URL = "https://fantasy.premierleague.com/api/element-summary/{}/"
FPL_HISTORY_URL = "https://fantasy.premierleague.com/api/entry/{}/history/"
FPL_TEAM_URL = "https://fantasy.premierleague.com/api/entry/{}/event/{}/picks/"
FPL_TEAM_TRANSFER_URL = "https://fantasy.premierleague.com/api/entry/{}/transfers/"
FPL_LEAGUE_URL = "https://fantasy.premierleague.com/api/leagues-classic/{}/standings/".format(self.FPL_LEAGUE_ID)
FPL_FIXTURE_URL = "https://fantasy.premierleague.com/api/fixtures/"
However, getting league standings now needs authentication - hint here: https://www.reddit.com/r/FantasyPL/comments/c64rrx/fpl_api_url_has_been_changed/ewj4ofd/
I've updated the URLs and implemented the authentication for league standings (means FPL login and password are now needed as env variables for that functionality).
Players that didn't play in gameweek 1 (it seems - might be another reason) appear to be getting an expected points total for gameweek 2 that assumes they will play, followed by expected points of 0 for gameweeks 3 and 4.
For example Lacazette, who didn't play vs. Newcastle in gameweek 1:
Getting points prediction for player Alexandre Lacazette
gameweek: 2 vs BUR home? True
Expected points: 5.19
gameweek: 3 vs LIV home? False
Expected points: 0.00
gameweek: 4 vs TOT home? True
Expected points: 0.00
In additions to setup_airsenal_db . we should have executables to:
It was more painful than it should have been to get the 18/19 season data ready for the start of the 19/20 season.
For next season we should have scripts ready that output the contents of the DB to the player_summary_YYYY.json, player_detail_YYYY.json and results_YYYY_with_gw.csv files, in the expected formats.
Factor result into expected minutes prediction.
In optimization, after playing free hit for next gw and making 14 transfers, get crash when printing the optimum team:
Traceback (most recent call last):
File "/Users/nbarlow/anaconda3/bin/run_airsenal_optimization", line 10, in <module>
sys.exit(main())
File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/scripts/fill_transfersuggestion_table.py", line 279, in main
print_team_for_next_gw(best_strategy)
File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/scripts/fill_transfersuggestion_table.py", line 145, in print_team_for_next_gw
expected_points = t.get_expected_points(next_gw,tag)
File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/framework/team.py", line 259, in get_expected_points
raise RuntimeError("Team is incomplete")
RuntimeError: Team is incomplete
why is the team incomplete?
When running predictions, without remaking the database from scratch (i.e. running update_airsenal_db
after the previous round of fixtures), get a crash when filling the dataframe used for generating predictions.
Traceback:
File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/scripts/fill_predictedscore_table.py", line 39, in make_predictedscore_table
prediction_dict = calc_all_predicted_points(gw_range, season, tag, session)
File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/scripts/fill_predictedscore_table.py", line 23, in calc_all_predicted_points
model_team, df_player = get_fitted_models(session)
File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/framework/prediction_utils.py", line 221, in get_fitted_models
df_team = get_result_df(session)
File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/framework/bpl_interface.py", line 29, in get_result_df
for s in session.query(Result).all()
File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/framework/bpl_interface.py", line 29, in <listcomp>
for s in session.query(Result).all()
AttributeError: 'NoneType' object has no attribute 'date'
Essentially, after doing update_airsenal_database
we are left with some "Result"s that do not have "Fixture"s.
i.e. effectively "joins" - avoid duplicating information e.g between matches and fixtures, and speed up search for player score - no need to explicitly search by match_id.
When running optimization, where best outcome was from playing free hit in the next gw, got the following output:
=========== Gameweek 37 ================
Cards played: F
Players in: Players out:
----------- ------------
Sergio Agüero Asmir Begovic
Raúl Jiménez Trent Alexander-Arnold
Paul Pogba Marcos Alonso
Jan Bednarek César Azpilicueta
Ryan Bertrand Aaron Wan-Bissaka
Heung-Min Son Heung-Min Son
Chris Smalling Mohamed Salah
Lucas Rodrigues Moura da Silva Lucas Rodrigues Moura da Silva
Aaron Wan-Bissaka Lys Mousset
Jason Steele Roberto Firmino
Andre Gray Joshua King
Eden Hazard Michel Vorm
Aymeric Laporte David Silva
Ilkay Gündogan Cesc Fàbregas
David de Gea Martin Kelly
however, the "players out" are not the players in the current team! must be from a previous iteration of the make_new_team
random process.
Could e.g. select a fixture from drop-down, and view predicted score probabilities, select one or more players and view predicted points over next few fixtures, and/or scores over previous fixtures.
For example, plot alpha vs beta for the team level model, as was done for the 2017/18 season, and p(score) vs p(assist) for FWD, MID, DEF.
Potentially produce a dashboard, where one can see history of a chosen player or team.
For start of 2019/20 season we had a hard-to-debug crash when initializing the player-level Stan model, which was traced back to an inconsistency in the player and match data for 2018/19 (for one playerscore row, nTeamGoals - numGoals - numAssists was -1) - this was eventually traced back to an incorrect gameweek in results_1819_with_gw.csv.
But we should be able to debug this much faster - we should check for things like this in the DB after every upload.
From first look will involve:
Cron job or AWS lambda function should fill match and playerscore tables after each gameweek.
e.g. by default turn off "Cannot afford player xxx" . and "Best formation is yyy" messages.
schema:
player_id, gameweek, bought_or_sold
we can then write a function that combines this information with the week-by-week player prices, and gives us an accurate budget, taking into account player price changes.
(If a player goes up in value while we own them, we only get half the price increase when we sell, so it's not sufficient just to use current player prices to get our budget).
Can at the very least do GK, DEF, MID, FWD in different threads.. Just need to make sure we have the same "tag" for all of them..
run after prediction-calculating lambda - run script to try different transfer strategies and update transfer_suggestions table.
This is the most compute-intensive part, may want to see if we can scale out.
When I try to run the notebook sandbox/modelling_test.ipynb
, I get an error at the line
from framework.utils import *
that is as follows:
NameError Traceback (most recent call last)
<ipython-input-2-d0dbd8c32518> in <module>()
9 import seaborn as sns
10
---> 11 from framework.utils import *
12
13 np.random.seed(42)
~/Projects/AIrsenal/framework/utils.py in <module>()
10 from .mappings import alternative_team_names, alternative_player_names
11
---> 12 from .data_fetcher import FPLDataFetcher, MatchDataFetcher
13 from .schema import (
14 Base,
~/Projects/AIrsenal/framework/data_fetcher.py in <module>()
47 FPL_TEAM_URL = "https://fantasy.premierleague.com/drf/entry/{}/event/{}/picks"
48 FPL_LEAGUE_URL = "https://fantasy.premierleague.com/drf/leagues-classic-standings/{}?phase=1&le-page=1&ls-page=1".format(
---> 49 LEAGUE_ID
50 )
51 DATA_DIR = "./data"
NameError: name 'LEAGUE_ID' is not defined
The team-level model produces probabilities for the scoreline between different teams. To get expected defensive points, we just need to compute the probability of a clean sheet using the team-level model and then multiply this by the number of points a given player will receive for a clean sheet.
For attacking points, a simple approach can be as follows. We learn three numbers per player from historical data:
Pr(score) Pr(assist) and Pr(not involved)
where these are the probabilities of the three possible outcomes for an individual player given that their team has scored a goal. The distribution of n_score
, n_assist
and n_not_involved
(for a given player) is then multinomial given the total number of goals scored by the team. We can use this to compute
Pr(attacking points | goals scored by team)
so that
Pr(attacking points) = sum_{goals scored by team) Pr(attacking points | goals scored by team) * Pr(goals scored by team)
where Pr(goals scored by team) is computed using the team-level model. Using this distribution, we can then compute the expected number of attacking points.
Database is on sqlite file stored on S3 bucket.
Can run lambda on a cron-type daily schedule to see if any new matches have been played, and if so, update the db.
Currently ignoring stuff like: a player might have been subbed off when a goal was scored etc etc
Given a starting team, should query the player_prediction table, and look N gameweeks ahead and choose best substitutions, with constraint of no more than one 4-point-hit per gameweek.
When attempting to install bpl
the following error occurs (the output is captured by using pip install -vvv ..
option)
*** Error compiling '/tmp/pip-install-22y8ywmq/pystan/pystan/stan/lib/stan_math/lib/boost_1.69.0/status/boost_check_library.py'...
File "/tmp/pip-install-22y8ywmq/pystan/pystan/stan/lib/stan_math/lib/boost_1.69.0/status/boost_check_library.py", line 166
print ">>> cwd: %s"%(os.getcwd())
^
SyntaxError: invalid syntax
System Information:
GNU/Linux 3.10.0-693.5.2.el7.x86_64
]2.18.0.0
]7.1.0
]Does this occur for anyone else?
This seems to be an unresolved issue at https://github.com/stan-dev/pystan/issues/584 relating to PyStan
As part of the effort to make AIrsenal easy to run on both past seasons and the current season, it would be good to have a simple table listing what teams were in the premier league for each season. I.e. just two columns:
name, season
"ARS", "1516"
"AVL", "1516"
...
"ARS","1920"
...
If we get sqlite db onto an S3 bucket, lambda to auto-fill scores, lambda to run predictions, should be easy to get Alexa skill to report on latest status.
For some details about our team, need to find a way to authenticate with FPL API.
This feature doesn't seem to be working currently.
This is hard. Some thoughts:
Open to ideas about this - we need a p(T) for the model to work (to get 60+ mins point, but also to estimate contribution to goals etc).
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: player.player_id
[SQL: INSERT INTO player (player_id, name) VALUES (?, ?)]
[parameters: ((1, 'Shkodran Mustafi'), (2, 'Héctor Bellerín'), (3, 'Sead Kolasinac'), (4, 'Ainsley Maitland-Niles'), (5, 'Sokratis Papastathopoulos'), (6, 'Nacho Monreal'), (7, 'Laurent Koscielny'), (8, 'Konstantinos Mavropanos') ... displaying 10 of 529 total bound parameter sets ... (487, 'Patrick Cutrone'), (528, 'Pedro Lomba Neto'))]
(Background on this error at: http://sqlalche.me/e/gkpj)
We don't have the exact value of players when we purchased them, just the price at the end of the gameweek when we bought them.
Run daily - if we are less than 24 hours before a gameweek deadline, run the script to fill the player_predictions table in the database.
Find way of predicting bonus points
Currently we treat the last N games as i.i.d. draws from the pdf P(T). This is ok, but it will not be able to pick up that players are likely to play fewer minutes per game over e.g. the Christmas period. @nbarlowATI has suggested using previous seasons' data to investigate this effect.
In two separate threads (i.e. evaluation of two strategies: "F0000" and "F0001"), got crash:
File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/framework/team.py", line 237, in apply_formation
if index < formation[i]:
TypeError: 'NoneType' object is not subscriptable
Currently every "strategy" is evaluated independently (in order to parallelize).
However, this is very wasteful - all strategies that begin with 2 transfers in the next gw will independently calculate the best transfers (which for 0,1,2 transfers is deterministic).
Would be better to cache the results as we go along, so we don't repeat, and instead have more of a tree structure.
Could do this by checking for the presence of JSON files in the /tmp/ directory with a certain identifier unique to this optimization run, and reading the players in/out from there..
Not clear then how this would work with greater parallelization e.g. on AWS at a later date..
Simple python code that can pick a team obeying all constraints (price, correct numbers of defenders, midfielders, forwards, no more than three players per team), e.g. optimizing using total expected points.
Some tests appear to depend on the database existing. It is not created upon install and so this causes them to fail on travis.
Currently there isn't one. From our brief discussion about it, we could:
At each gameweek, consider the "wildcard strategy" where the entire team is changed. The steps would be something like:
This is obviously going to dominate the computational cost (need to forecast for many more weeks than before, plus running the optimizer many times where unlimited transfers are allowed). A simpler strategy could be a deterministic rule - the pre-xmas wildcard is played at the very last moment, and the post-xmas wildcard is played before the first double gameweek, unless the number of injured / banned players in the squad exceeds some threshold, then the wildcard is played.
This repo is open so travis is free.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.