Giter VIP home page Giter VIP logo

mezzala's Introduction

Mezzala

Models for estimating football (soccer) team-strength

Install

pip install mezzala

How to use

import mezzala

Fitting a Dixon-Coles team strength model:

First, we need to get some data

import itertools
import json
import urllib.request


# Use 2016/17 Premier League data from the openfootball repo
url = 'https://raw.githubusercontent.com/openfootball/football.json/master/2016-17/en.1.json'


response = urllib.request.urlopen(url)
data_raw = json.loads(response.read())

# Reshape the data to just get the matches
data = list(itertools.chain(*[d['matches'] for d in data_raw['rounds']]))

data[0:3]
[{'date': '2016-08-13',
  'team1': 'Hull City AFC',
  'team2': 'Leicester City FC',
  'score': {'ft': [2, 1]}},
 {'date': '2016-08-13',
  'team1': 'Everton FC',
  'team2': 'Tottenham Hotspur FC',
  'score': {'ft': [1, 1]}},
 {'date': '2016-08-13',
  'team1': 'Crystal Palace FC',
  'team2': 'West Bromwich Albion FC',
  'score': {'ft': [0, 1]}}]

Fitting a model

To fit a model with mezzala, you need to create an "adapter". Adapters are used to connect a model to a data source.

Because our data is a list of dicts, we are going to use a KeyAdapter.

adapter = mezzala.KeyAdapter(       # `KeyAdapter` = datum['...']
    home_team='team1',
    away_team='team2',
    home_goals=['score', 'ft', 0],  # Get nested fields with lists of fields
    away_goals=['score', 'ft', 1],  # i.e. datum['score']['ft'][1]
)

# You'll never need to call the methods on an 
# adapter directly, but just to show that it 
# works as expected:
adapter.home_team(data[0])
'Hull City AFC'

Once we have an adapter for our specific data source, we can fit the model:

model = mezzala.DixonColes(adapter=adapter)
model.fit(data)
DixonColes(adapter=KeyAdapter(home_goals=['score', 'ft', 0], away_goals=['score', 'ft', 1], home_team='team1', away_team='team2'), blocks=[TeamStrength(), BaseRate(), HomeAdvantage()]), weight=UniformWeight()

Making predictions

By default, you only need to supply the home and away team to get predictions. This should be supplied in the same format as the training data.

DixonColes has two methods for making predictions:

  • predict_one - for predicting a single match
  • predict - for predicting multiple matches
match_to_predict = {
    'team1': 'Manchester City FC',
    'team2': 'Swansea City FC',
}

scorelines = model.predict_one(match_to_predict)

scorelines[0:5]
[ScorelinePrediction(home_goals=0, away_goals=0, probability=0.023625049697587167),
 ScorelinePrediction(home_goals=0, away_goals=1, probability=0.012682094432376022),
 ScorelinePrediction(home_goals=0, away_goals=2, probability=0.00623268833779594),
 ScorelinePrediction(home_goals=0, away_goals=3, probability=0.0016251514235046444),
 ScorelinePrediction(home_goals=0, away_goals=4, probability=0.00031781436109636405)]

Each of these methods return predictions in the form of ScorelinePredictions.

  • predict_one returns a list of ScorelinePredictions
  • predict returns a list of ScorelinePredictions for each predicted match (i.e. a list of lists)

However, it can sometimes be more useful to have predictions in the form of match outcomes. Mezzala exposes the scorelines_to_outcomes function for this purpose:

mezzala.scorelines_to_outcomes(scorelines)
{Outcomes('Home win'): OutcomePrediction(outcome=Outcomes('Home win'), probability=0.8255103334702835),
 Outcomes('Draw'): OutcomePrediction(outcome=Outcomes('Draw'), probability=0.11615659853961693),
 Outcomes('Away win'): OutcomePrediction(outcome=Outcomes('Away win'), probability=0.058333067990098304)}

Extending the model

It's possible to fit more sophisticated models with mezzala, using weights and model blocks

Weights

You can weight individual data points by supplying a function (or callable) to the weight argument to DixonColes:

mezzala.DixonColes(
    adapter=adapter,
    # By default, all data points are weighted equally,
    # which is equivalent to:
    weight=lambda x: 1
)
DixonColes(adapter=KeyAdapter(home_goals=['score', 'ft', 0], away_goals=['score', 'ft', 1], home_team='team1', away_team='team2'), blocks=[TeamStrength(), BaseRate(), HomeAdvantage()]), weight=<function <lambda> at 0x123067488>

Mezzala also provides an ExponentialWeight for the purpose of time-discounting:

mezzala.DixonColes(
    adapter=adapter,
    weight=mezzala.ExponentialWeight(
        epsilon=-0.0065,               # Decay rate
        key=lambda x: x['days_ago']
    )
)
DixonColes(adapter=KeyAdapter(home_goals=['score', 'ft', 0], away_goals=['score', 'ft', 1], home_team='team1', away_team='team2'), blocks=[TeamStrength(), BaseRate(), HomeAdvantage()]), weight=ExponentialWeight(epsilon=-0.0065, key=<function <lambda> at 0x122f938c8>)

Model blocks

Model "blocks" define the calculation and estimation of home and away goalscoring rates.

mezzala.DixonColes(
    adapter=adapter,
    # By default, only team strength and home advantage,
    # is estimated:
    blocks=[
        mezzala.blocks.HomeAdvantage(),
        mezzala.blocks.TeamStrength(),
        mezzala.blocks.BaseRate(),      # Adds "average goalscoring rate" as a distinct parameter
    ]
)
DixonColes(adapter=KeyAdapter(home_goals=['score', 'ft', 0], away_goals=['score', 'ft', 1], home_team='team1', away_team='team2'), blocks=[TeamStrength(), HomeAdvantage(), BaseRate()]), weight=UniformWeight()

To add custom parameters (e.g. per-league home advantage), you need to add additional model blocks.

mezzala's People

Contributors

torvaney avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

gregorydsam

mezzala's Issues

Add constraint on rho parameter

As in the original paper, an upper and lower limit on rho should be added such that the probabilities are always positive.

See also: #2

Using xG instead of goals

Since Dixon-Coles model uses Poisson distribution I guess it is not possible (or easy) to use xG values instead of actual goals to train the model. I tried multiplying xG values by 100 and then rounding, and it gave team strengths just fine (I guess) but simulating the games did not work correctly (the outcomes become 0).

Just wanted to learn if it is possible to somehow integrate xG into this. I came across your Dixon Coles and xG: together at last blog post and it suggests we can incorporate xG simulations into this by "tricking" the model. Personally I could not come up with a way to do it in Python (mainly because I did not understand what to pass as weights parameter).

Some help is appreciated if you are still maintaining this.

nan Issue for Some Predictions

Hey @Torvaney, thanks a lot for great repo. I was using dixon-coles model for some calculations but realized that for some cases we can end up with 'nan' in some probabilities. I did try to some debugging but was not able to find what is the reason of that. I am sharing reproducible code as following

import mezzala
adapter = mezzala.KeyAdapter(
                home_team='home_team_name',
                away_team='away_team_name',
                home_goals='home_score',
                away_goals='away_score',
            )
# following is first 4 week of Bundesliga 2021-2022
previous_matches = [{'home_team_name': "Borussia M'Gladbach", 'away_team_name': 'Bayern München', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Arminia Bielefeld', 'away_team_name': 'SC Freiburg', 'home_score': 0, 'away_score': 0}, {'home_team_name': 'FC Augsburg', 'away_team_name': 'TSG Hoffenheim', 'home_score': 0, 'away_score': 4}, {'home_team_name': 'Union Berlin', 'away_team_name': 'Bayer Leverkusen', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'VfB Stuttgart', 'away_team_name': 'Greuther Fürth', 'home_score': 5, 'away_score': 1}, {'home_team_name': 'Wolfsburg', 'away_team_name': 'VfL Bochum', 'home_score': 1, 'away_score': 0}, {'home_team_name': 'Borussia Dortmund', 'away_team_name': 'Eintracht Frankfurt', 'home_score': 5, 'away_score': 2}, {'home_team_name': 'Mainz 05', 'away_team_name': 'RB Leipzig', 'home_score': 1, 'away_score': 0}, {'home_team_name': '1. FC Köln', 'away_team_name': 'Hertha BSC', 'home_score': 3, 'away_score': 1}, {'home_team_name': 'RB Leipzig', 'away_team_name': 'VfB Stuttgart', 'home_score': 4, 'away_score': 0}, {'home_team_name': 'VfL Bochum', 'away_team_name': 'Mainz 05', 'home_score': 2, 'away_score': 0}, {'home_team_name': 'Eintracht Frankfurt', 'away_team_name': 'FC Augsburg', 'home_score': 0, 'away_score': 0}, {'home_team_name': 'SC Freiburg', 'away_team_name': 'Borussia Dortmund', 'home_score': 2, 'away_score': 1}, {'home_team_name': 'Greuther Fürth', 'away_team_name': 'Arminia Bielefeld', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Hertha BSC', 'away_team_name': 'Wolfsburg', 'home_score': 1, 'away_score': 2}, {'home_team_name': 'Bayer Leverkusen', 'away_team_name': "Borussia M'Gladbach", 'home_score': 4, 'away_score': 0}, {'home_team_name': 'TSG Hoffenheim', 'away_team_name': 'Union Berlin', 'home_score': 2, 'away_score': 2}, {'home_team_name': 'Bayern München', 'away_team_name': '1. FC Köln', 'home_score': 3, 'away_score': 2}, {'home_team_name': 'Borussia Dortmund', 'away_team_name': 'TSG Hoffenheim', 'home_score': 3, 'away_score': 2}, {'home_team_name': 'Arminia Bielefeld', 'away_team_name': 'Eintracht Frankfurt', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Augsburg', 'away_team_name': 'Bayer Leverkusen', 'home_score': 1, 'away_score': 4}, {'home_team_name': '1. FC Köln', 'away_team_name': 'Bochum', 'home_score': 2, 'away_score': 1}, {'home_team_name': 'Mainz 05', 'away_team_name': 'Greuther Fürth', 'home_score': 3, 'away_score': 0}, {'home_team_name': 'VfB Stuttgart', 'away_team_name': 'Freiburg', 'home_score': 2, 'away_score': 3}, {'home_team_name': 'Bayern München', 'away_team_name': 'Hertha BSC', 'home_score': 5, 'away_score': 0}, {'home_team_name': 'Union Berlin', 'away_team_name': "Borussia M'Gladbach", 'home_score': 2, 'away_score': 1}, {'home_team_name': 'Wolfsburg', 'away_team_name': 'RB Leipzig', 'home_score': 1, 'away_score': 0}, {'home_team_name': 'Bayer Leverkusen', 'away_team_name': 'Borussia Dortmund', 'home_score': 3, 'away_score': 4}, {'home_team_name': 'SC Freiburg', 'away_team_name': '1. FC Köln', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'Greuther Fürth', 'away_team_name': 'Wolfsburg', 'home_score': 0, 'away_score': 2}, {'home_team_name': 'TSG Hoffenheim', 'away_team_name': 'Mainz 05', 'home_score': 0, 'away_score': 2}, {'home_team_name': 'Union Berlin', 'away_team_name': 'FC Augsburg', 'home_score': 0, 'away_score': 0}, {'home_team_name': 'RB Leipzig', 'away_team_name': 'Bayern München', 'home_score': 1, 'away_score': 4}, {'home_team_name': 'Eintracht Frankfurt', 'away_team_name': 'VfB Stuttgart', 'home_score': 1, 'away_score': 1}, {'home_team_name': 'VfL Bochum', 'away_team_name': 'Hertha BSC', 'home_score': 1, 'away_score': 3}, {'home_team_name': "Borussia M'Gladbach", 'away_team_name': 'Arminia Bielefeld', 'home_score': 3, 'away_score': 1}]

match_to_predict = {'home_team_name': 'Wolfsburg', 'away_team_name': 'Eintracht Frankfurt'}
scorelines = model.predict_one(match_to_predict, 6)
print (scorelines)

When we check scorelines, we can see following

ScorelinePrediction(home_goals=0, away_goals=1, probability=nan)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.