^{1}

^{1}

We illustrate through a case study that regressive prediction is the best method to forecast sports outcomes. By taking predictions of promotion to first division soccer from a mathematician from one of the most famous sports websites in Brazil, we show that making Bayesian updates is misleading when we expect regression to the mean. The expert failed to realize that the more extreme the results are, the more regression is expected, because extremely good scores suggest very lucky days.

Sports competitions are an attractive field of study. A sports league forms a relatively isolated system, with few external and unrestrained influences. Besides, it is replicated over time almost in the same conditions and under the same rules. Also, the large amount of data available makes it possible to learn their statistical patterns, not to mention their popularity, which fuels a multibillion-dollar business that includes television, advertisements and a huge betting market [

For those reasons, it makes sense that sports media asks mathematicians and statisticians to predict tournament outcomes. In Brazil, this occurs especially in soccer and it becomes a topic of discussion among fans and experts. This work takes one of those forecasts and investigates whether it uses regressive prediction. We show that it does not use, and this leads to incorrect outcomes.

Section 2 presents the predictions used in the study; Section 3 discusses them and Section 4 concludes this report.

Soccer is the most popular sport in Brazil and in the world [

The website https://globoesporte.globo.com/, one of the most-followed sports websites in Brazil, published a series of articles with probabilities of promotion for the soccer clubs disputing the 2018 Brazilian second division (called Série B) with odds calculated by mathematician T.G.. He is a specialist in soccer probabilities and his website contains predictions for the major Brazilian tournaments. The first online article was published with four games remaining for each club, the second after the 36th round and the last one before the 38^{th} round. We show the Série B standings for the top 10 clubs in the last four rounds and the promotion probabilities assigned to each club in

There are 20 clubs in Série B. During the course of a season (from April to December), each club plays the others twice (a double round-robin system), once at its home stadium and once at its opponents, for 38 games. Clubs receive three points for a win and one point for a draw. No points are awarded for a loss. The top four clubs are promoted to Brazil’s first division (Série A).

After 34 games played, Fortaleza clinched a promotion, while eight clubs were

34th round | Pts | Prob. (%) | 35th round | Pts | 36th round | Pts | Prob. (%) | 37th round | Pts | Prob. (%) | Final round | Pts |
---|---|---|---|---|---|---|---|---|---|---|---|---|

1. Fortaleza* | 64 | 100 | 1. Fortaleza* | 65 | 1. Fortaleza* | 68 | 100 | 1. Fortaleza* | 71 | 100 | 1. Fortaleza* | 71 |

2. CSA | 57 | 91 | 2. CSA | 58 | 2. CSA | 59 | 90 | 2. Goiás* | 60 | 100 | 2. CSA* | 62 |

3. Avaí | 56 | 81 | 3. Goiás | 57 | 3. Goiás | 57 | 70 | 3. Avaí | 60 | 90 | 3. Avaí* | 61 |

4. Goiás | 54 | 64 | 4. Avaí | 57 | 4. Avaí | 57 | 56 | 4. Ponte Preta | 59 | 65 | 4. Goiás* | 60 |

5. Vila Nova | 52 | 20 | 5. Londrina | 54 | 5. Ponte Preta | 56 | 44 | 5. CSA | 59 | 45 | 5. Ponte Preta | 60 |

6. Londrina | 51 | 18 | 6. Ponte Preta | 53 | 6. Londrina | 55 | 25 | 6. Atlético/GO | 56 | 1 | 6. Atlético/GO | 59 |

7. Atlético/GO | 51 | 18 | 7. Atlético/GO | 52 | 7. Vila Nova | 55 | 13 | 7. Vila Nova | 56 | 0 | 7. Vila Nova | 57 |

8. Ponte Preta | 50 | 6 | 8. Vila Nova | 52 | 8. Atlético/GO | 53 | 2 | 8. Londrina | 55 | 0 | 8. Londrina | 55 |

9. Guarani | 49 | 4 | 9. Guarani | 50 | 9. Guarani | 50 | 0 | 9. Guarani | 51 | 0 | 9. Guarani | 54 |

10. Coritiba | 46 | - | 10. São Bento | 46 | 10. Coritiba | 49 | - | 10. Coritiba | 49 | - | 10. Coritiba | 52 |

Notes: *Club achieved promotion.

still in the race for the three remaining spots. The other three clubs completing the top four were given the best chance to qualify for first division soccer, namely CSA (91%), Avaí (81%) and Goiás (64%). The five remaining clubs were given relatively low odds: Vila Nova (20%), Londrina (18%), Atlético-GO (18%), Ponte Preta (6%) and Guarani (4%).

Only after the 36th round, another online article with new probabilities was released. One team dropped out from battle and one gained relevance, Guarani and Ponte Preta, respectively. The latter ascended to a 44% chance of promotion coming from two wins in a row. But the three teams with the most chances were still the ones among the top four, CSA (90%), Goiás (70%) and Avaí (56%).

Prior to the last round, a new online article was disclosed. There were only two spots left because Goiás clinched the promotion. For the first time, there was a change in the ranking of clubs with the most chances of qualification―CSA (45%) gave its place to Ponte Preta (65%). Avaí continued with a high probability (90%) and Atlético-GO had a 1% chance.

In the end, the teams that achieved promotion were the ones that were in the top four remaining four rounds: Fortaleza, CSA, Avaí and Goiás (first column in

As pointed out by Daniel Kahneman [

Wainer [

To support this view, Kahneman [

Suppose you are asked to predict his score on Day 2. You expect the golfer to retain the same level of talent on the second day, so your best guess will be “above average”. Luck, of course, is a different matter. Since you have no way of predicting the golfers’ luck on the second (or any) day, your best guess must be that it will be average, neither good nor bad. This means that in the absence of any other information, your best guess about the players’ score on Day 2 should not be a repeat of their performance on Day 1. The golfer who did well on Day 1 is likely to be successful on Day 2 as well, but less so than on the first day, because the unusual luck he probably enjoyed on Day 1 is unlikely to hold.

Kahneman [

We can apply the theory of regression to the mean to the subject of our study. Based on the standings until the 34th round, we can say CSA was more talented than Ponte Preta and the luck of both teams was distributed evenly through these games. Now, looking into the following three rounds, Ponte Preta won them all, whereas CSA drew two and lost one game, which made it reach the final round tied in points. Suppose you are asked to predict which club, between these two (Ponte Preta and CSA), is going to achieve promotion to first division in the last round. Which one do you choose?

Implementing regressive prediction, the obvious choice would be CSA. Due to random fluctuation in the quality of performance of Ponte Preta, three wins in a row, a pattern not seen before by the club in the tournament, we can say the team was experiencing above average luck. At some point, luck runs out. As observed, the more extreme the results, the more regression we expect, because those suggest very lucky days. So, we could expect a different outcome for Ponte Preta’s last game, probably worse―something that became a fact. The team drew its last game.

We can exercise the opposite reasoning with CSA. They had three bad results in sequence, a sign of below average luck. The bad luck was not going to hold for a very long time, and the best guess was a better performance in the last round, which happened. They won and got the promotion to Série A. Over time, low scores on the first occasions will, on average, improve, whereas high scores will decline.

Pluchino and coauthors [

Kahneman and Tversky [

Kahneman and Tversky [

Kahneman and Tversky [

Kahneman [

As observed, the model supposedly developed by mathematician T.G. to predict the chances of each club to achieve promotion had a Bayesian touch. Gelman [

We analyzed the predictions made by a mathematician about the probabilities of club promotion to first division soccer in Brazil. Instead of performing regressive predictions, he made Bayesian updates, which led him to a misguided forecast in the last round of the tournament. The specialist failed to notice that in the presence of uncertainty, the best prediction for a repeated performance of a club is less extreme than the initial score.

Extreme good results suggest lucky days; therefore, regression to the mean is expected because the unusual luck is unlikely to continue. Of course the accuracy of regressive prediction is not guaranteed, but at least it is reasonable. You still incur errors when your forecasts are unbiased, but those are reduced. Other procedures favor exceptional outcomes at the cost of mistakenly missing most cases. We do not recognize regression effects for what they are. Even a mathematician can end up neglecting a statistical phenomenon.

Financial support from CNPq and Capes is acknowledged.

The authors declare no conflicts of interest regarding the publication of this paper.

Silva, M. and Da Silva, S. (2019) Regressive Prediction Is the Best Way to Forecast Sports Outcomes: Evidence from Brazilian Soccer. Open Access Library Journal, 6: e5264. https://doi.org/10.4236/oalib.1105264