Giter VIP home page Giter VIP logo

airsenal's People

Contributors

a1fus avatar abelarm avatar anguswilliams91 avatar biggins avatar callistusndemo avatar callummole avatar chahak13 avatar chiefsan avatar craddm avatar crangelsmith avatar dependabot[bot] avatar georgewhewell avatar helendduncan avatar hsteinmueller avatar iansealy avatar jack89roberts avatar jpkfin avatar keshabb avatar louiseabowler avatar lukehare avatar nbarlowati avatar oscartgiles avatar radka-j avatar rchan26 avatar robwhickman avatar spool avatar sreyan-ghosh avatar tahmeed156 avatar tallamjr avatar tdarnell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

airsenal's Issues

Initial Stan model

First Stan model of attacking and defending (alphas and betas) per team.
Train using historical data.

Revisit use of "method" or "tag" to identify predictions and perform optimizations

This variable is intended to allow the optimizer to retrieve a consistent set of predicted points from the DB.
Currently a lot of places have a default value of this argument - some risk that we may not be using the predictions that we think we are in all places.

Suggested change is to have fill_predictedscore_table generate a UUID (or timestamp) to put into this field in the DB - same value for all rows in a given run of this script.
The fill_transfersuggestion_table will query the predicted_score table, and look at the last row, and then use this value for the optimization.
Remove all default values of this argument from function definitions, to flush out cases where we are calling a function without explicitly specifying it.

Parallelize transfer optimization

Use multiprocess pool to run iterations over different transfer strategies in parallel. Need to think about how to recombine at the end to find overall maximum.

FPL API URLs Have Changed

See branch: fix/fpl_api_urls_2019

framework/data_fetcher.py was giving me errors (empty data) as some of the FPL API URLs have changed. Plus trailing forward slash seems to be important, e.g. https://fantasy.premierleague.com/api/bootstrap-static/ returns data but https://fantasy.premierleague.com/api/bootstrap-static doesn't.

This set of URLs seems to work (where {} are filled by python .format calls, usually with a team, player or league id):

FPL_SUMMARY_API_URL = "https://fantasy.premierleague.com/api/bootstrap-static/"
FPL_DETAIL_URL = "https://fantasy.premierleague.com/api/element-summary/{}/"
FPL_HISTORY_URL = "https://fantasy.premierleague.com/api/entry/{}/history/"
FPL_TEAM_URL = "https://fantasy.premierleague.com/api/entry/{}/event/{}/picks/"
FPL_TEAM_TRANSFER_URL = "https://fantasy.premierleague.com/api/entry/{}/transfers/"
FPL_LEAGUE_URL = "https://fantasy.premierleague.com/api/leagues-classic/{}/standings/".format(self.FPL_LEAGUE_ID)
FPL_FIXTURE_URL = "https://fantasy.premierleague.com/api/fixtures/"

However, getting league standings now needs authentication - hint here: https://www.reddit.com/r/FantasyPL/comments/c64rrx/fpl_api_url_has_been_changed/ewj4ofd/

I've updated the URLs and implemented the authentication for league standings (means FPL login and password are now needed as env variables for that functionality).

Expected points for players that didn't play GW1

Players that didn't play in gameweek 1 (it seems - might be another reason) appear to be getting an expected points total for gameweek 2 that assumes they will play, followed by expected points of 0 for gameweeks 3 and 4.

For example Lacazette, who didn't play vs. Newcastle in gameweek 1:

Getting points prediction for player Alexandre Lacazette
gameweek: 2 vs BUR home? True
Expected points: 5.19
gameweek: 3 vs LIV home? False
Expected points: 0.00
gameweek: 4 vs TOT home? True
Expected points: 0.00

More entrypoints

In additions to setup_airsenal_db . we should have executables to:

  • update DB (new results+playerscores)
  • run predictions
  • run optimization
  • make plots of mini-league points etc.

Scripts to dump DB contents to CSV / JSON

It was more painful than it should have been to get the 18/19 season data ready for the start of the 19/20 season.
For next season we should have scripts ready that output the contents of the DB to the player_summary_YYYY.json, player_detail_YYYY.json and results_YYYY_with_gw.csv files, in the expected formats.

Crash when printing team after free hit

In optimization, after playing free hit for next gw and making 14 transfers, get crash when printing the optimum team:

Traceback (most recent call last):
  File "/Users/nbarlow/anaconda3/bin/run_airsenal_optimization", line 10, in <module>
    sys.exit(main())
  File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/scripts/fill_transfersuggestion_table.py", line 279, in main
    print_team_for_next_gw(best_strategy)
  File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/scripts/fill_transfersuggestion_table.py", line 145, in print_team_for_next_gw
    expected_points = t.get_expected_points(next_gw,tag)
  File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/framework/team.py", line 259, in get_expected_points
    raise RuntimeError("Team is incomplete")
RuntimeError: Team is incomplete

why is the team incomplete?

Crash when running predictions possibly due to rescheduled fixtures

When running predictions, without remaking the database from scratch (i.e. running update_airsenal_db after the previous round of fixtures), get a crash when filling the dataframe used for generating predictions.

Traceback:

File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/scripts/fill_predictedscore_table.py", line 39, in make_predictedscore_table
    prediction_dict = calc_all_predicted_points(gw_range, season, tag,  session)
  File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/scripts/fill_predictedscore_table.py", line 23, in calc_all_predicted_points
    model_team, df_player = get_fitted_models(session)
  File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/framework/prediction_utils.py", line 221, in get_fitted_models
    df_team = get_result_df(session)
  File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/framework/bpl_interface.py", line 29, in get_result_df
    for s in session.query(Result).all()
  File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/framework/bpl_interface.py", line 29, in <listcomp>
    for s in session.query(Result).all()
AttributeError: 'NoneType' object has no attribute 'date'

Essentially, after doing update_airsenal_database we are left with some "Result"s that do not have "Fixture"s.

Nonsensical "players out" when printing optimization output with free hit

When running optimization, where best outcome was from playing free hit in the next gw, got the following output:

=========== Gameweek 37 ================

Cards played:  F

Players in:			Players out:
-----------			------------
Sergio Agüero			Asmir Begovic
Raúl Jiménez			Trent Alexander-Arnold
Paul Pogba			Marcos Alonso
Jan Bednarek			César Azpilicueta
Ryan Bertrand			Aaron Wan-Bissaka
Heung-Min Son			Heung-Min Son
Chris Smalling			Mohamed Salah
Lucas Rodrigues Moura da Silva		Lucas Rodrigues Moura da Silva
Aaron Wan-Bissaka			Lys Mousset
Jason Steele			Roberto Firmino
Andre Gray			Joshua King
Eden Hazard			Michel Vorm
Aymeric Laporte			David Silva
Ilkay Gündogan			Cesc Fàbregas
David de Gea			Martin Kelly

however, the "players out" are not the players in the current team! must be from a previous iteration of the make_new_team random process.

Scripts to perform sanity checks on input data

For start of 2019/20 season we had a hard-to-debug crash when initializing the player-level Stan model, which was traced back to an inconsistency in the player and match data for 2018/19 (for one playerscore row, nTeamGoals - numGoals - numAssists was -1) - this was eventually traced back to an incorrect gameweek in results_1819_with_gw.csv.

But we should be able to debug this much faster - we should check for things like this in the DB after every upload.

Enable predictions/ team selection to work on past seasons

From first look will involve:

  • add "season" column to "player", "fixture", "player_prediction", "transaction", and "transfer_suggestion" tables.
  • refactor more db-filling code out of scripts and into library code ("history_utils.py"?) with "season" as a potential argument (but default to "1819").
  • in db-filling code - if season is not "1819", use CSV rather than API inputs for player lists, fixtures, match results etc. - will need to use player-level data to know what matches were in what gameweeks.

add Transaction table, and fill with players bought and sold so far.

schema:
player_id, gameweek, bought_or_sold

we can then write a function that combines this information with the week-by-week player prices, and gives us an accurate budget, taking into account player price changes.

(If a player goes up in value while we own them, we only get half the price increase when we sell, so it's not sufficient just to use current player prices to get our budget).

AWS - lambda to optimize transfer strategy

run after prediction-calculating lambda - run script to try different transfer strategies and update transfer_suggestions table.

This is the most compute-intensive part, may want to see if we can scale out.

Import bug

When I try to run the notebook sandbox/modelling_test.ipynb, I get an error at the line

from framework.utils import *

that is as follows:

NameError                                 Traceback (most recent call last)
<ipython-input-2-d0dbd8c32518> in <module>()
      9 import seaborn as sns
     10 
---> 11 from framework.utils import *
     12 
     13 np.random.seed(42)

~/Projects/AIrsenal/framework/utils.py in <module>()
     10 from .mappings import alternative_team_names, alternative_player_names
     11 
---> 12 from .data_fetcher import FPLDataFetcher, MatchDataFetcher
     13 from .schema import (
     14     Base,

~/Projects/AIrsenal/framework/data_fetcher.py in <module>()
     47 FPL_TEAM_URL = "https://fantasy.premierleague.com/drf/entry/{}/event/{}/picks"
     48 FPL_LEAGUE_URL = "https://fantasy.premierleague.com/drf/leagues-classic-standings/{}?phase=1&le-page=1&ls-page=1".format(
---> 49     LEAGUE_ID
     50 )
     51 DATA_DIR = "./data"

NameError: name 'LEAGUE_ID' is not defined

Implement player-level forecasts

The team-level model produces probabilities for the scoreline between different teams. To get expected defensive points, we just need to compute the probability of a clean sheet using the team-level model and then multiply this by the number of points a given player will receive for a clean sheet.

For attacking points, a simple approach can be as follows. We learn three numbers per player from historical data:

Pr(score) Pr(assist) and Pr(not involved)

where these are the probabilities of the three possible outcomes for an individual player given that their team has scored a goal. The distribution of n_score, n_assist and n_not_involved (for a given player) is then multinomial given the total number of goals scored by the team. We can use this to compute

Pr(attacking points | goals scored by team)

so that

Pr(attacking points) = sum_{goals scored by team) Pr(attacking points | goals scored by team) * Pr(goals scored by team)

where Pr(goals scored by team) is computed using the team-level model. Using this distribution, we can then compute the expected number of attacking points.

AWS - lambda to update database

Database is on sqlite file stored on S3 bucket.
Can run lambda on a cron-type daily schedule to see if any new matches have been played, and if so, update the db.

Write team optimization function

Given a starting team, should query the player_prediction table, and look N gameweeks ahead and choose best substitutions, with constraint of no more than one 4-point-hit per gameweek.

Unable to install on Linux

When attempting to install bpl the following error occurs (the output is captured by using pip install -vvv .. option)

    *** Error compiling '/tmp/pip-install-22y8ywmq/pystan/pystan/stan/lib/stan_math/lib/boost_1.69.0/status/boost_check_library.py'...
      File "/tmp/pip-install-22y8ywmq/pystan/pystan/stan/lib/stan_math/lib/boost_1.69.0/status/boost_check_library.py", line 166
        print ">>> cwd: %s"%(os.getcwd())
                          ^
    SyntaxError: invalid syntax

System Information:

  • Operating System: [GNU/Linux 3.10.0-693.5.2.el7.x86_64]
  • PyStan Version: [2.18.0.0]
  • GCC Version [7.1.0]

Does this occur for anyone else?

This seems to be an unresolved issue at https://github.com/stan-dev/pystan/issues/584 relating to PyStan

Create a "Teams" table in the DB.

As part of the effort to make AIrsenal easy to run on both past seasons and the current season, it would be good to have a simple table listing what teams were in the premier league for each season. I.e. just two columns:
name, season
"ARS", "1516"
"AVL", "1516"
...
"ARS","1920"
...

Alexa skill

If we get sqlite db onto an S3 bucket, lambda to auto-fill scores, lambda to run predictions, should be easy to get Alexa skill to report on latest status.
For some details about our team, need to find a way to authenticate with FPL API.

Estimating time spent on the pitch

This is hard. Some thoughts:

  1. We obviously need to check if they are "red" on the FPL website. If so, we can check when they are likely to return (if players are suspended, there is metadata that says when their ban is over).
  2. To forecast how long they will be on the pitch, we should consider their recent history - how long to consider? And how to use that information?

Open to ideas about this - we need a p(T) for the model to work (to get 60+ mins point, but also to estimate contribution to goals etc).

Lots more tests

  • Dummy match and player models with e.g. 100% prob of 1-0 win, 50% prob of assist | team-goal etc. etc. and ensure that attacking and defending points are calculated correctly.
  • Dummy predicted points for players for single gameweek (e.g. 11 players get points 1-11, 4 players get zero) - check that subs and captain optimization works as expected.
  • Dummy predicted points for a few gameweeks ahead - check that transfer strategy optimization works as expected.

Getting "UNIQUE constraint failed: player.player_id" while running setup_airsenal_database

sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: player.player_id
[SQL: INSERT INTO player (player_id, name) VALUES (?, ?)]
[parameters: ((1, 'Shkodran Mustafi'), (2, 'Héctor Bellerín'), (3, 'Sead Kolasinac'), (4, 'Ainsley Maitland-Niles'), (5, 'Sokratis Papastathopoulos'), (6, 'Nacho Monreal'), (7, 'Laurent Koscielny'), (8, 'Konstantinos Mavropanos') ... displaying 10 of 529 total bound parameter sets ... (487, 'Patrick Cutrone'), (528, 'Pedro Lomba Neto'))]
(Background on this error at: http://sqlalche.me/e/gkpj)

Purchase price of players

We don't have the exact value of players when we purchased them, just the price at the end of the gameweek when we bought them.

AWS - lambda to run predictions

Run daily - if we are less than 24 hours before a gameweek deadline, run the script to fill the player_predictions table in the database.

Improve estimation of time spent on the pitch

Currently we treat the last N games as i.i.d. draws from the pdf P(T). This is ok, but it will not be able to pick up that players are likely to play fewer minutes per game over e.g. the Christmas period. @nbarlowATI has suggested using previous seasons' data to investigate this effect.

Occasional crash in optimization with allow_free_hit

In two separate threads (i.e. evaluation of two strategies: "F0000" and "F0001"), got crash:

File "/Users/nbarlow/anaconda3/lib/python3.6/site-packages/airsenal/framework/team.py", line 237, in apply_formation
    if index < formation[i]:
TypeError: 'NoneType' object is not subscriptable

Caching partial results of optimization.

Currently every "strategy" is evaluated independently (in order to parallelize).
However, this is very wasteful - all strategies that begin with 2 transfers in the next gw will independently calculate the best transfers (which for 0,1,2 transfers is deterministic).
Would be better to cache the results as we go along, so we don't repeat, and instead have more of a tree structure.
Could do this by checking for the presence of JSON files in the /tmp/ directory with a certain identifier unique to this optimization run, and reading the players in/out from there..
Not clear then how this would work with greater parallelization e.g. on AWS at a later date..

Write initial framework

Simple python code that can pick a team obeying all constraints (price, correct numbers of defenders, midfielders, forwards, no more than three players per team), e.g. optimizing using total expected points.

Fix tests

Some tests appear to depend on the database existing. It is not created upon install and so this causes them to fail on travis.

Wildcard strategy

Currently there isn't one. From our brief discussion about it, we could:

At each gameweek, consider the "wildcard strategy" where the entire team is changed. The steps would be something like:

  1. Compute the best expected points 10 weeks into the future when the wildcard is played.
  2. For the remaining gameweeks until the current wildcard expires (until Christmas, or until the end of the season if after Jan 1st [check this is the right date]), repeat the same calculation.
  3. If playing the wildcard now gives the best expected points return, then play it.

This is obviously going to dominate the computational cost (need to forecast for many more weeks than before, plus running the optimizer many times where unlimited transfers are allowed). A simpler strategy could be a deterministic rule - the pre-xmas wildcard is played at the very last moment, and the post-xmas wildcard is played before the first double gameweek, unless the number of injured / banned players in the squad exceeds some threshold, then the wildcard is played.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.