Comments (8)
Thanks @Hughesy
The error means that the data you're trying to predict has a different set of teams to those used to fit the model. R uses factors to handle categorical variables, variables that have a fixed and known set of possible values. To make predictions, the teams' factor levels have to be the same in the prediction data as in the training data.
So, to get around this, you need to make sure the home and away team columns in your unplayed games dataframe are both a factor
(as opposed to character
) and have all the teams from the training data represented in the levels.
One way to do this would be to use the factor_teams
function on the whole dataset, before splitting into train and predictions subsets. Another way might be to use the factor
function to turn your unplayed data's home and away team columns into factors with the right levels.
If you can post a reprex
I might be able to give more specific help for your use-case.
from regista.
Apologies for the delay in response, I really appreciate your help!
For some reason, reprex is producing a bunch of errors, and I haven't had the time this week to work out why, so unfortunately, I can only provide a copy-paste of the code below
When I look through the code below, I can see that after I bring the excel into a data frame, the home and away columns are both characters. What would you suggest the best approach is? Also, I believe all the teams have the same training data
Thanks again!
> library(tidyverse)
-- Attaching packages ------------------------------- tidyverse 1.3.2 --
v ggplot2 3.4.1 v purrr 1.0.1
v tibble 3.1.8 v dplyr 1.1.0
v tidyr 1.3.0 v stringr 1.5.0
v readr 2.1.4 v forcats 1.0.0
-- Conflicts ---------------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
> library(regista)
> library(readxl)
> data <-
+ read_xlsx("C:/Users/jake2/OneDrive/Documents/HockeyFixtures.xlsx") %>%
+ factor_teams(c("home", "away"))
> data
# A tibble: 72 x 106
Div Date Time home away hgoal agoal result HTHG
<chr> <dttm> <lgl> <fct> <fct> <dbl> <dbl> <chr> <lgl>
1 E0 2022-09-24 00:00:00 NA Beest~ Nott~ 2 2 D NA
2 E0 2022-09-24 00:00:00 NA East ~ Birm~ 2 1 H NA
3 E0 2022-09-24 00:00:00 NA Hamps~ Buck~ 5 0 H NA
4 E0 2022-09-24 00:00:00 NA Lough~ Surb~ 0 5 A NA
5 E0 2022-09-24 00:00:00 NA Wimbl~ Read~ 1 0 H NA
6 E0 2022-09-24 00:00:00 NA Clift~ Holc~ 2 0 H NA
7 E0 2022-10-01 00:00:00 NA Readi~ Clif~ 1 2 A NA
8 E0 2022-10-01 00:00:00 NA Holco~ Hamp~ 3 3 D NA
9 E0 2022-10-01 00:00:00 NA Surbi~ East~ 3 0 H NA
10 E0 2022-10-01 00:00:00 NA Wimbl~ Loug~ 2 2 D NA
# ... with 62 more rows, and 97 more variables: HTAG <lgl>, HTR <lgl>,
# Referee <lgl>, HS <lgl>, AS <lgl>, HST <lgl>, AST <lgl>, HF <lgl>,
# AF <lgl>, HC <lgl>, AC <lgl>, HY <lgl>, AY <lgl>, HR <lgl>,
# AR <lgl>, B365H <lgl>, B365D <lgl>, B365A <lgl>, BWH <lgl>,
# BWD <lgl>, BWA <lgl>, IWH <lgl>, IWD <lgl>, IWA <lgl>, PSH <lgl>,
# PSD <lgl>, PSA <lgl>, WHH <lgl>, WHD <lgl>, WHA <lgl>, VCH <lgl>,
# VCD <lgl>, VCA <lgl>, MaxH <lgl>, MaxD <lgl>, MaxA <lgl>, ...
# i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
> teams <- factor(levels(data$home), levels = levels(data$home))
> teams
[1] Beeston W East Grinstead W
[3] Hampstead & Westminster W Loughborough W
[5] Wimbledon W Clifton Robinsons W
[7] Reading W Holcombe W
[9] Surbiton W Nottm Forest W
[11] Buckingham W Birmingham W
12 Levels: Beeston W East Grinstead W ... Birmingham W
> unplayed_games <- read_xlsx("C:/Users/jake2/OneDrive/Documents/Unplayed_HockeyFixtures(Offline).xlsx")
> unplayed_games
# A tibble: 24 x 2
home away
<chr> <chr>
1 Beeston W Clifton Robinsons W
2 East Grinstead W Wimbledon W
3 East Grinstead W Clifton Robinsons W
4 East Grinstead W Surbiton W
5 Hampstead & Westminster W Beeston W
6 Hampstead & Westminster W East Grinstead W
7 Loughborough W Reading W
8 Loughborough W Holcombe W
9 Loughborough W Birmingham W
10 Wimbledon W Beeston W
# ... with 14 more rows
# i Use `print(n = ...)` to see more rows
> model <- dixoncoles(hgoal, agoal, home, away, data = data)
Warning messages:
1: In model.matrix.default(~values - 1, model.frame(~values - 1), contrasts = FALSE) :
non-list contrasts argument ignored
2: In model.matrix.default(~values - 1, model.frame(~values - 1), contrasts = FALSE) :
non-list contrasts argument ignored
3: In model.matrix.default(~values - 1, model.frame(~values - 1), contrasts = FALSE) :
non-list contrasts argument ignored
4: In model.matrix.default(~values - 1, model.frame(~values - 1), contrasts = FALSE) :
non-list contrasts argument ignored
>
> model
Dixon-Coles model with specification:
Home goals: hgoal ~ off(home) + def(away) + hfa + 0
Away goals: agoal ~ off(away) + def(home) + 0
Weights : 1
> team_parameters <-
+ tidy.dixoncoles(model) %>%
+ filter(parameter %in% c("off", "def")) %>%
+ mutate(value = exp(value)) %>%
+ spread(parameter, value)
> match_probabilities <-
+ regista::augment.dixoncoles(model, unplayed_games, type.predict = "outcomes") %>%
+ unnest() %>%
+ spread(outcome, prob) %>%
+ mutate(data = paste(home, away))
Error in predict.dixoncoles(x, newdata, type = type.predict) :
New data must have the same factor levels as the data used to fit.
See ?factor_teams
In addition: Warning messages:
1: In model.matrix.default(~values - 1, model.frame(~values - 1), contrasts = FALSE) :
non-list contrasts argument ignored
2: In model.matrix.default(~values - 1, model.frame(~values - 1), contrasts = FALSE) :
non-list contrasts argument ignored
3: In model.matrix.default(~values - 1, model.frame(~values - 1), contrasts = FALSE) :
non-list contrasts argument ignored
4: In model.matrix.default(~values - 1, model.frame(~values - 1), contrasts = FALSE) :
non-list contrasts argument ignored
from regista.
Thanks
So, you see how in unplayed_games
, home
and away
are character fields (<chr>
), and not factors (<fct>
as in played_games
)?
> unplayed_games
# A tibble: 24 x 2
home away
<chr> <chr>
1 Beeston W Clifton Robinsons W
2 East Grinstead W Wimbledon W
You need to make these factors with the same levels as those in played_games
.
For example,
team_levels <- levels(played_games$home)
unplayed_games <- read_xlsx("C:/Users/jake2/OneDrive/Documents/Unplayed_HockeyFixtures(Offline).xlsx") %>%
mutate(home = factor(home, levels = team_levels),
away = factor(away, levels = team_levels))
This might seem a little bit arcane, but there are reasons for doing things this way, and the model uses the levels to match the teams to the parameter estimates.
from regista.
So I have implemented it as mentioned above, but I am now getting the below error when I run match_probabilities. Any ideas?
`
match_probabilities <-
- regista::augment.dixoncoles(model, unplayed_games, type.predict = "outcomes") %>%
- unnest() %>%
- spread(outcome, prob) %>%
- mutate(data = paste(home, away))
Error in fn(out, elt, ...) :
number of rows of matrices must match (see arg 2)
In addition: Warning messages:
1: In model.matrix.default(~values - 1, model.frame(~values - 1), contrasts = FALSE) :
non-list contrasts argument ignored
2: In model.matrix.default(~values - 1, model.frame(~values - 1), contrasts = FALSE) :
non-list contrasts argument ignored`
from regista.
Hmmm - I'm struggling to replicate this error with another dataset, I'm afraid. Do you have a link to the specific files you're using?
from regista.
@Torvaney see attached for both the fixtures played and unplayed. Its a really weird error. Will be interesting to see if you can replicate the error with my files.
Thanks,
UnplayedGames.xlsx
HockeyFixtures.xlsx
from regista.
So, I've tried to reproduce your error, but seem to be getting predictions. Can you try running in a fresh R session again and paste the output of devtools::session_info()
if it fails?
library(tidyverse)
library(regista)
library(readxl)
data <-
read_xlsx("~/Downloads/HockeyFixtures.xlsx") %>%
factor_teams(c("home", "away"))
team_levels <- levels(data$home)
unplayed_games <-
read_xlsx("~/Downloads/UnplayedGames.xlsx") %>%
mutate(home = factor(home, levels = team_levels),
away = factor(away, levels = team_levels))
model <- dixoncoles(hgoal, agoal, home, away, data = data)
#> Warning in model.matrix.default(~values - 1, model.frame(~values - 1),
#> contrasts = FALSE): non-list contrasts argument ignored
#> Warning in model.matrix.default(~values - 1, model.frame(~values - 1),
#> contrasts = FALSE): non-list contrasts argument ignored
#> Warning in model.matrix.default(~values - 1, model.frame(~values - 1),
#> contrasts = FALSE): non-list contrasts argument ignored
#> Warning in model.matrix.default(~values - 1, model.frame(~values - 1),
#> contrasts = FALSE): non-list contrasts argument ignored
team_parameters <-
tidy.dixoncoles(model) %>%
filter(parameter %in% c("off", "def")) %>%
mutate(value = exp(value)) %>%
spread(parameter, value)
match_probabilities <-
regista::augment.dixoncoles(model, unplayed_games, type.predict = "outcomes") %>%
unnest(cols = c(.outcomes)) %>%
spread(outcome, prob) %>%
mutate(data = paste(home, away))
#> Warning in model.matrix.default(~values - 1, model.frame(~values - 1),
#> contrasts = FALSE): non-list contrasts argument ignored
#> Warning in model.matrix.default(~values - 1, model.frame(~values - 1),
#> contrasts = FALSE): non-list contrasts argument ignored
#> Warning in model.matrix.default(~values - 1, model.frame(~values - 1),
#> contrasts = FALSE): non-list contrasts argument ignored
#> Warning in model.matrix.default(~values - 1, model.frame(~values - 1),
#> contrasts = FALSE): non-list contrasts argument ignored
match_probabilities
#> # A tibble: 19 × 6
#> home away away_…¹ draw home_…² data
#> <fct> <fct> <dbl> <dbl> <dbl> <chr>
#> 1 Beeston W Clifton Robinsons W 0.344 0.257 0.399 Bees…
#> 2 East Grinstead W Wimbledon W 0.397 0.275 0.328 East…
#> 3 East Grinstead W Clifton Robinsons W 0.139 0.167 0.694 East…
#> 4 East Grinstead W Surbiton W 0.800 0.134 0.0660 East…
#> 5 Hampstead & Westminster W East Grinstead W 0.333 0.207 0.460 Hamp…
#> 6 Loughborough W Holcombe W 0.330 0.228 0.441 Loug…
#> 7 Loughborough W Birmingham W 0.268 0.251 0.482 Loug…
#> 8 Wimbledon W Beeston W 0.129 0.238 0.633 Wimb…
#> 9 Wimbledon W Hampstead & Westminst… 0.314 0.286 0.399 Wimb…
#> 10 Clifton Robinsons W Surbiton W 0.907 0.0739 0.0196 Clif…
#> 11 Reading W Nottingham W 0.316 0.183 0.500 Read…
#> 12 Reading W Buckingham W 0.0908 0.152 0.757 Read…
#> 13 Holcombe W Nottingham W 0.384 0.170 0.446 Holc…
#> 14 Holcombe W Birmingham W 0.297 0.237 0.466 Holc…
#> 15 Surbiton W Beeston W 0.0150 0.0599 0.925 Surb…
#> 16 Surbiton W Hampstead & Westminst… 0.0657 0.142 0.792 Surb…
#> 17 Buckingham W Loughborough W 0.711 0.169 0.119 Buck…
#> 18 Buckingham W Nottingham W 0.784 0.109 0.107 Buck…
#> 19 Birmingham W Reading W 0.451 0.263 0.286 Birm…
#> # … with abbreviated variable names ¹away_win, ²home_win
Created on 2023-03-01 with reprex v2.0.2
Also - Surbiton must be some team! 😄
from regista.
Genius! Tried the above, and it works, however something funky is going on when I run calculate_table(single_simulation)
. There is one round of games left in the season, yet when I run calculate_table(single_simulation)
, a number of simulated fixtures are not pulling through into the calculated table, as can be seen below when comparing the simmed table and regular table.
I think this is a result of pulling in my own list of unplayed games. As I can't see anything else causing the issue. Have you see anything like this before? Latest files have been attached if you don't mind taking a look.
`> calculate_table(data)
A tibble: 12 x 10
team w d l gp gf ga gd points position
1 Surbiton W 13 0 1 14 51 7 44 39 1
2 Wimbledon W 9 5 1 15 25 12 13 32 2
3 Hampstead & Westminster W 8 4 3 15 40 18 22 28 3
4 East Grinstead W 7 4 3 14 39 22 17 25 4
5 Beeston W 6 3 6 15 22 31 -9 21 5
6 Nottingham W 5 3 7 15 37 46 -9 18 6
7 Loughborough W 4 5 6 15 22 30 -8 17 7
8 Clifton Robinsons W 4 4 7 15 22 28 -6 16 8
9 Birmingham W 4 4 7 15 19 29 -10 16 9
10 Reading W 4 3 8 15 20 28 -8 15 10
11 Holcombe W 2 5 8 15 22 37 -15 11 11
12 Buckingham W 3 0 12 15 15 46 -31 9 12
calculate_table(single_simulation)
A tibble: 12 x 10
team w d l gp gf ga gd points position
1 Surbiton W 14 1 1 16 60 9 51 43 1
2 Wimbledon W 10 5 1 16 28 13 15 35 2
3 Hampstead & Westminster W 7 5 3 15 37 19 18 26 3
4 East Grinstead W 7 4 5 16 40 32 8 25 4
5 Beeston W 7 3 6 16 29 31 -2 24 5
6 Nottingham W 6 3 6 15 43 46 -3 21 6
7 Loughborough W 4 5 6 15 22 31 -9 17 7
8 Birmingham W 4 5 7 16 19 29 -10 17 8
9 Clifton Robinsons W 4 4 7 15 21 30 -9 16 9
10 Reading W 4 3 8 15 20 28 -8 15 10
11 Holcombe W 2 4 9 15 23 39 -16 10 11
12 Buckingham W 3 0 13 16 17 52 -35 9 12`
HockeyFixtures.xlsx
UnplayedGames.xlsx
from regista.
Related Issues (20)
- Return tibbles
- Use rsample > modelr HOT 1
- Use tidyeval > lazyeval HOT 1
- Create a package site HOT 1
- Dixon-Robinson fit
- predict.dixoncoles requires unnecessary home/away goals columns
- Informative error message when predicting with different factor levels
- Warnings after fit HOT 2
- Dixon-Robinson predict method HOT 1
- Broom model methods
- Include example goal-times data
- Correct old blogs and documentation
- Speed up dixoncoles tests
- Dixoncoles really really slow HOT 6
- DixonColes error message HOT 5
- Can't create table of scoreline probabilities without dixoncoles class object HOT 3
- error with broom HOT 8
- Use Github Actions
- Non-list contrasts argument ignored while modeling HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from regista.