Here’s a brief guide to the statistical analysis that underpins the Times Predictor. The Predictor can take any two teams and assign probabilities to each final score should they meet and play. From this, you can then derive the probabilities of Win/Draw/Lose and from those, the chances of finishing the Season in each different League position. As our understanding of the statistical processes that drive the game become clearer, so we aim to make the Predictor forecast a wider range of predictions (e.g. half-time score, shots at goal, and so on). In the meantime, below are the general principles that are used in the current "full time score" Predictor. There are two main parts – firstly, a model of the process by which full time results are caused and secondly, the fine-tuning of that model to existing results in order to make future predictions. Forecasting football necessarily starts with some kind of framework or model of the game. This framework obviously needs to be quantitative, but it also needs to efficiently compress a game into a few key team performance statistics. Ultimately, the need for this compression is about signal extraction. One of the fascinating things about football is that it is chaotic and unpredictable. As such, it doesn’t make sense to predict a game using just the previous results from matches between those two teams. Instead, we want to look at how the two teams have fared more recently against other teams too. In order to do this, we need some kind of transportable insight from a given game and what it tells us about each team’s current ability. The framework we use is based around the average goals a team can expect to score against an opponent. Each team is given an "attack coefficient" and a "defence coefficient" and these are the number of goals they would score or concede against an average team. These coefficients are then multiplied in order to obtain the expected number of goals in a specific game against a specific opponent. For example, say two teams had the following coefficients: Man United: Attack 1.7 / Defence 0.8 Fulham: Attack 0.8 / Defence 1.1 When Fulham plays Man United in this example, we expect Man U to score 1.7*1.1 = 1.9 goals, and Fulham to score 0.8 *0.8 = 0.6 goals. So, a lower Defence coefficient and a higher Attack coefficient are indicative of better backs and forwards in a team respectively. The actual analysis we conduct also includes a home advantage coefficient too. Converting this goal average into a distribution of goal outcomes is achieved using a statistical distribution based on the expected number of goals. The chances of the various goal outcomes for Fulham and Man United, and assuming Man United are at home, appear in the graph below left. Fulham (in blue) have about a 55% chance of scoring nothing and just over a 30% chance of getting one goal. Adding up the chances of these different scores leads to the forecast 0.6 goal average. Lastly, below right, we multiply the chances of these different goal outcomes for both teams and obtain the chances of each final score combination. The graph shows that the most likely outcome is a 2-0 home victory for Man United. Based on coefficients for each team, the final score of a given match can be forecast. But where do these coefficients come from? The technique used to generate the coefficients is called maximum likelihood estimation and it is pretty much as it sounds – the coefficients are selected to maximise the likelihood of predicting the actual results of prior matches. This idea is illustrated in the table below. Say, you have a bunch of teams and pick a set of coefficients for them. Then using the calculations outlined before, you can predict the likelihood of any outcome of any game, according to the selected coefficients. You could then look at previous games and see what probability this combination of coefficients assigns to the actual outcomes that occurred. In the first column of the table, the chance of a Home win at the first game was forecast as 48%. The average of that column is then 37%. Now suppose you choose a second set of coefficients and do the same thing. It turns out this set of coefficients is actually more in line with the prior data than the first set, since it averages a 44% prediction for the prior outcomes. Finally, you can look at a third set of coefficients, and so and so on, until you find the set of coefficients that provide the best possible predictions of the prior games. In effect, this is like choosing the tipster with the best track record. It’s also worth noting that, in practice, the way the average past performance is calculated is more heavily weighted to recent games than to older games. Nevertheless, the Predictor still assesses back over nearly two Seasons in order to make its calculations and therefore is simultaneously judging the team coefficients that are most consistent with nearly 2,500 individual games. Probabilities Forecast for Each Actual Outcome
The Game into Numbers


Forecasting Games
By a Given Set of Team Coefficients
|
Sets of Team Coefficients |
||||
|
Game |
Actual Result |
Set1 |
Set2 |
Set3 |
|
Game1 |
Home Win |
48% |
35% |
25% |
|
Game2 |
Away Win |
29% |
47% |
16% |
|
Game3 |
Draw |
34% |
49% |
29% |
|
Average |
37% |
44% |
23% |
|