atas76 / openengine Goto Github PK

View Code? Open in Web Editor NEW

57.0 11.0 4.0 1.58 MB

Football match engine

License: MIT License

Java 100.00%

football football-simulation football-manager match-engine

openengine's Introduction

OpenEngine

Football match engine

This should be the final attempt (of many) in creating an open source football match engine. The reason for so many previous attempts (you can see them by browsing this account) was more indecisiveness and starting over with a fresh approach rather than objective 'failure'. Also I have been too ambitious with the potential outcome. As of lately, I have decided to take it easy, without ever abandoning this project as a concept.

I will do my best to incorporate as many potential approaches possible in an incremental fashion. Starting simple without any particular expectations will do the trick in making this project the "final" edition of something that I decided to pursue as early as 2003. The most conceptually representative project in this repository is openfootie. I will be doing some archaeology from time to time for either code or 'ideas' reuse. Some of these projects might also be resurrected. But in all fairness, I should have a definitive edition just for purity's sake: an open source football match engine, "pluggable" to host applications, with the latter being a big feature/added bonus than the raison d'être.

The motto of this engine shall be this: don't fake output. There should be consistency in how the input (players as "agents", carrying the AI and relative attribute values, along with the team tactics) produces the output (whether that would be a 3D representation of the football match or just a plain score). Making this input - output mapping as realistic as possible should be the ultimate goal. But whatever abstraction we apply, the underlying concept should be that of consistency.

Update

Revisiting what I wrote in the beginning of this project, I am a bit surprised to see that my opinions and impressions have not shifted much since then. Surprised because this project at least conceptually has matured, and now I have a much clearer idea about the next steps and stages, along with its limitations. And with this comes a better appreciation of the challenges of this project (which were always known anyway). The difference lies in the chasm between the idea of an ideal football match engine and a 'good enough' prototype, with the latter being the basis toward the former. It seems now, that not only is there a chasm but also a conflict between the two, if not regarding implementation, surely in terms of time and project management. For there are no shortcuts and MVPs when you are scratching your own itch.

A little review of where this project is now, in comparison to how it started. The original legacy project openfootie (which itself had evolved from previous attempts) was based on the idea of simulating player actions evaluating their outcome. The problem with that approach (apart from the sketchy implementation) was that it lacked the context in the implementation of each match event. The only parameter for 'deciding' on an action was a player's position, while the rest of parameters were added as an afterthought. Also the outcomes' evaluation should also depend on the context, and this also would be added as an afterthought, especially for having tactics influencing outcomes.

What do I mean by 'afterthought'? The way a player makes decisions and the match is reproduced in general is data-driven. We generate different football matches, by essentially reshuffling the data. Since we only use and need a very small sample of even of one match half to generate the data, if we would like to be blindly faithful to the data we would miss a lot of real world scenarios or overemphasize others. One solution to this would be a big enough sample, or more easily and quickly, we could intervene in the engine logic and adjust it to include more real world scenarios. As a side-note this is my latest addition to this project: mapping goal attempts to expected goals (xG), I can make the outcomes vary, rather than use the ones fixed on the data itself.

So, in this incarnation of the project, my goal has been to make it as much data-driven as possible, including potentially needed context parameters, leaving little room for (ad-hoc) interventions in the match engine's logic. In this respect, context is not only the player's (having possession of the ball) position on the pitch, but also the phase of the match and all other players' positions. A tracking data based engine more or less.

You understand now how time-consuming that is. In the long run, codifying and interpreting the tracking data would be automated, but my approach is to start and continue small and menially, for fully appreciating the next steps and directions.

Another approach would be to still make an MVP (or many) based on what we have so far, making assumptions and creating plausible results. The project, I think, is now robust enough to allow for such deviations and interventions, while still remaining honest to its conception.

In conclusion, the OpenEngine project is nothing more than an umbrella or 'meta'-project for the different approaches one could take on creating a match engine, with the data-driven approach as basis being more conscious and intentional than in the legacy openfootie project.

I may be providing a few updates on this with my thoughts as I will be continuing with the next stages of this project. Now on a more technical review on its current stage: one thing I had also got wrong not only on the original project, but also until lately is that roughly-speaking the level of detail between input and output should be the same. For example, I would apply either a top-down approach or a bottom-up one and that should reflect on both the input and output. Examples: a top-down approach would be to have the teams as inputs (even as minimal as Elo rankings) and only the match score as output. A very extreme example, as technically this would not be an engine, but we could have elaborated on that to make it at least look like an engine (by producing more stats). On the other hand, we could provide a lot more information as input (regarding tactics and player attributes), and expect full tracking data, with the match being able to be replayed visually. However, apart from not being necessary that input and output match at the level of sophistication, it also gives us flexibility in applying different approaches for the further development of the engine. At this stage of the project, I use a 'bottom-up' input, which is the data itself (notice that I split the match data according to team possession, serving as each team's input, except that instead of providing tactics and 'attributes', I am providing raw match data), while the output is 'top-down', as we only see a sequence of events abstracted at the 'team level' (so much that we wouldn't for example extract reliable statistics). However, this very abstracted approach is very flexible, as we could go higher or lower-level without the input being tightly-bound on the output, or for that matter, being concerned at this stage with what a 'complete' match engine would look like.

openengine's People

Contributors

Stargazers

Watchers

Forkers

linyu19872008 bediki ldhapple

openengine's Issues

Test competition runs (FA Cup 2024; no home advantage, extra time and penalties tiebreakers)

Use logging framework for testing/debugging

Get rid of those testing "System.out"s

Get shot data (xG, etc.)

We need to achieve a minimal sampling of matches, with the goal (no pun intended) to have a valid sample of goal attempts fully covered, in order to build additional samples on top of that from summary data (#3)

Test competition runs (FA Cup 2024; no home advantage, extra time tiebreaker)

Relabel 'goalAttemptOutcome' field to 'finalState'

In the future, we will need a field to describe not only goal attempt outcomes, but also 'final state' outcomes in general.

Add two half times together to a full match

Trivial but good to be reminded.

Simplify DynamicTransition objects creation

There are quite a few fields in a DynamicTransition, especially in the constructor. Maybe we need some defaults (with null value), for those fields whose value is going to be calculated dynamically. One other design decision is whether to create a copy constructor from the Statement object in order to avoid having numerous parameters in the constructor.

Add team (or individual) ability as a factor in calculating outcomes

Probably, we start with goals/chances first, and work our way deeper. Calculation on individual players skills and match tactics will be handled later. Maybe, we also go for separation of skills, collectively per team, rather than collective ability per player (realism over plausability, again).

Get rid of System.outs and log stuff properly

Currently, I have some (commented-out) System.out-s for debugging. A more proper way (and not for debugging only) to log match events would be to output to a default file. We also need to decide what format the logs would be. Maybe, in the future create also a file for outputting match summary report, but for displaying the score the console is sufficient at the moment.

Use database for data storage (both inputs and outputs) of tournament simulations

Test competition runs (FA Cup 2024; no home advantage, 'coin toss' tiebreaker)

This is mostly for statistical purposes rather than actual testing (hopefully, but looks good so far): run the FA cup sample competition multiple times and check how plausible the outcomes are.

Add instructions on how to run this project

It'd be nice to have a brief introduction at the README.md explaining how to install & run this project. This will help any potential contributor to dive into the code & mechanics and push this project even further.

I personally don't know how to compile it, otherwise I'd create a PR with the instructions.

Very interesting project

Hello, first of all, congratulations for this very interesting project (very rare in this field...).
I started watching the code but I'm a little bit lost with all the acronyms (e.g. mv, ttl, mpn. pnw, etc.). Is there a sort of docs to look at?
Thanks in advance.

Simulate home advantage

For competition matches, and in general, we need to add home advantage as parameter in our calculations. For competition matches, this will also need to be applied according to the competition rules (e.g. for FA Cup, applied only in rounds prior to the semi final).

Test match engine output

Manual testing of match engine output and recording of bugs found. They will normally be resolved one by one, before proceeding to more testing.

Kick-off time

Take into account time from goal scoring to kick-off.

Calculate goals based on a separate sample and not on the raw data itself

Goals and chances are important events and much easier to sample, from a plethora of web sites. We emded these data separately to be used in the flow generated by the raw data.

Test competition runs (FA Cup 2024; home advantage, full tiebreakers)

Distinguish between defensive and attacking boxes

Well, initially, we had a simple 'B' box label for defending, while we were being more specific for the opposition (attacking) box. While this would be a convenient approximation, it still creates a loss regarding the wings outside the boxes (as 'Bw' is used for both attack and defence). Also, we will eventually need to be specific, especially with the 'pitch control' approach. So, it is a good idea to bring back the 'AB' and 'DB' prefixes, for 'attacking' and 'defensive' boxes (opposition and ours), respectively.

Remove 'isGoal' flag

Having a flag for goals (from successful attempts) is probably redundant and a proven source of bugs. For example, we opted to use an intermediate GOAL_ATTEMPT_OUTCOME state as a result of the dynamic calculation of goal attempts outcomes, instead of a flag. Rather for using a flag for goals - for consistency among other things - we should just update the end state of the transition and take only transitions with GOAL_ATTEMPT_OUTCOME as initial state into account for counting goals (as we have to do anyway, it is just we won't need to refer to a redundant flag).

Nice work!

Do you have a road map for your project?

Implement proper tiebreaker

For simplicity, we currently implement the knockout matches tiebreakers with a 'coin toss' simulation. Proper tiebreakers (extra time + penalty shoot-out) are going to be implemented next for knockout competition matches.

Resources to create a sport management engine

Hey name is Mitchell and I am a BIG soccer fan. I've always dreamed of building my own Soccer Maagement Simulator (like Top Eleven and the likes) but I don't really know where to start. How did you learn to create such a system? Do you mind sharing any resources you used to get to this point? Amazing work!

Write clear match commentary

A first step towards fixing the issues raised in #39 is to write clear commentary which will help immensely at initial debugging stages and sanity checks, while the output itself would make more sense from a "user's" perspective.

Awarded penalties evaluation

Since penalties are 80% goals, we also need to intervene in their evaluations, as we do for goals, rather than rely on the very limited and consequently biased data.

Write tests for action chain mappings generation

Have to be careful to take into account when match flow "breaks". The simulator will fail on runtime if this happens, but maybe would be a good idea to have a check on that, for when the data or the code get more complicated.

Injury time (and general time management)

So, far injury time is not implemented, and we assume that there is a fixed duration for each match. This would be generally fine, in terms of the simulation itself, except the case where penalties are awarded in the end of the match (still not too important, because it is a simulation after all, but also more striking to the eye to pass by). In any case, we need to see how time is allocated for each phase, as there is a lot of implicit 'dead time' included in phase periods, and this also is something to be taken into account in parallel with the specifics of injury time.

Take into account the current state's duration, rather than the next one's in sequence

In the very first implementation, we are using the time of the next state in sequence for calculating the match duration. While this is practical and it evens out in total, it would be more correct to use the current state's time, because the duration of an action to complete is dependent not only on the action itself, but also on its outcome. We need to treat the whole state - action - outcome - state sequence as a unified chain, which will encapsulate all parameters of the transition between states, rather than split the initial state from its outcome, taking into account only the duration of the latter.

Examples:

Let the initial state be a corner kick. How long the corner kick will take will depend on its action (how it will be taken; not covered in this initial version) and its outcome. Currently, this duration is already predetermined by the previous state in sequence, because it would be convenient along with the outcome to pick also its duration, which however now is split from how the actual initial outcome's state plays out:

Corner Kick -> Corner Kick: 48 sec

if the corner kick results in a corner kick again, the duration added up will be the next corner kick's, even if the transition from one corner kick to the other would sensibly take much shorter time (and its assumed time would again be picked randomly from the previous state's transition).

Let's say we have the transition as specified in the match data:

Possession -> Corner Kick: 12 sec -> Possession

The corner kick taking 12 seconds will lead to a new possession, and it will be added as the initial's transition duration. However, as we don't want to overfit the simulation we will pick another outcome, which, with the current implementation, it will 'borrow' the duration from the next state, while the actual durations will be hidden from the implementation (the 12 seconds will be added as if the corner kick's outcome would be a possession, according to the data, but the simulation changes the outcome, without taking into account the change in duration, as the latter was predetermined when the first transition was selected from the previous state).

While reshuffling the outcomes is the main idea of this simulation, it would be more correct to pair the durations with the initial states rather than their outcomes in duration calculations while determining the outcomes.

Create grid

Tournament runs aggregator

Pass success probability

Identify and map pass types to their probability of success

Rewrite MPN code

I am not happy with the MPN processing code as it is. It is already growing into a maintenance nightmare, where the logic is about reading my own mind and putting the right values in the "right" places, which seem arbitrary and unintuitive at first glance.

A first step towards its improvement is to do some housekeeping, like common best practices, but I think a more drastic approach is required in the less short term (something along the lines of a partial rewrite, while the codebase is still relatively small). One idea would be to separate the construction of the 'flow chain' of events from their actual processing, so that 'dynamic' interventions are integrated more naturally with the data, and the processing and presentation logic are not mixed with 'decision making' logic and (chance) outcome handling.

But still cutting my teeth at this point is welcome, in order to identify as many of the potential issues to be addressed as possible.

UPDATE

It very difficult to debug and reproduce bugs with enough confidence. I need to rewrite the code, without altering something in its logic conceptually, but do the implementation changes mentioned above.

Pass probability calculation

Currently the pass probability is dependent on a baseline probability (allowing for mistakes), and the distance (vertical and horizontal). We are not concerned at this stage with how tight the marking is; we take it to be the same on average for the whole match duration and pitch. If, according to tactics, there is a marker, then we calculate the pass probability, otherwise we assume the pass has 100% success. We also would like to allow for 'half marking', when a player is loosely marked according to tactics (let's say the winger in a 4-4-2 is marked by a midfielder of a 4-3-3 defending team). In this case, we would like to factor how tightly a player could be marked, not according to the specific match instance, but according to the tactical layout, by attenuating the distance factors.

Possession change flaw from goal attempt outcomes

When intervening on goal attempt outcomes, and possession changes, the attacking team seems to erroneously retain possession.

Categorize different directives

Currenty, in the latest edition of the data tracking language (TTN - Tactical Tracking Notation) the match data are broadly under two categories: statements and directives. Statements describe what in football analytics is called 'match events'. Directives, while descriptive in themselves, cover many different metadata categories, and these categories should be distinguished further. The only way that this could be done at this point is with application logic (a naming convention was actually added to distinguish one particular category), which is far for optimal. First, the different categories should be identified and documented, and then changes in syntax (and/or new keywords) will be introduced to reflect the different categories.

Use pitch positions for transition evaluations

Using just the match phase states for chaining transitions is a handy hack, however for both realism and accuracy we should be also using the pitch positions as reference. This was anyway the next step in the previous incarnation of this approach. One interesting thing to check is whether we have overfitting, with the low amount of seed data.