AFL SuperCoach Scaling (Part 1)
A framework for backsolving the AFL KFC Supercoach scaling algorithm
This series of articles is an attempt to backsolve the AFL KFC Supercoach scaling algorithm. The short answer is no — I haven’t solved it yet — however given that we are in the early days of the 2024 AFL Men’s premiership season, I’m hoping that the discipline of reviewing each round as the season evolves and the ability to bootstrap learnings from prior rounds gets me a lot closer to the final answer.
The KFC SuperCoach competition is a fantasy football competition using the statistics of AFL players as the scoring base. Fox Sports describes the competition as follows —
A unique scoring formula that rewards players who help their team win matches sets KFC SuperCoach apart from any other fantasy game. The scoring system has been developed by Champion Data and refined over many years to reward the most important actions in games.
Preliminary research into this has suggested that there are two main aspects which are less understood about the scoring formula, namely -
- the full list of player match statistics and their weightings which constitute the raw score for a given player
- the final scaling methodology at the end of each match which then determines the final player SuperCoach score
These two aspects are inter-linked, hence its not easy to solve one part of the problem without solving the other. This point underscores the framework of my approach — which is to bootstrap learnings and try to get to a better understanding in circular fashion (like a slinky). Hence in this article, I will outline my thought process, throw in a bit of math and a bit of intuition about how to set up the problem and interpret the results step by step.
Assumptions
Given the quantity of unknowns, the easiest way to begin solving the problem is to start with a set of assumptions for the baseline model and then sequentially refine them over time in subsequent follow-up articles.
Assumption #1 : The KFC SuperCoach terms & conditions is the source of truth.
While I’ve come across a few articles by “those in the know” that there are more than the 20 statistics listed above that make up the actual supercoach scores, lets park that one for the time being, and assume that it falls into the category defined by Clause 4.10 and 4.11 (above) of the terms and conditions.
Assumption #2 : The sum of all scores of a given match totals 3300.
This one isn’t in the terms and conditions, but its pretty much as close to fact as you can get — most if not all of the scores for a given match falls within +/- 5points of 3300, according to the dataset that we are working with.
Data
The publicly available SuperCoach scores were scraped from the individual pages for each match from Footywire for past seasons. Additionally the player statistics were obtained from from the fitzRoy data package — these include both the “standard” as well as the “extended” set of statistics. I haven’t done an exact count, but there looks to be around 60 different stats, some of which are derived from the raw numbers.
The Champion Data glossary provides some definitions which describes the player actions that unscore these statistics.
When we try to marry up the dataset against the terms & conditions, the philosophy of SuperCoach scoring becomes more obvious -
- Scoring favours “effective” actions — which is determined by whether the interaction takes place between two players of the same (effective) or different teams (ineffective).
- Scoring is also dependent on difficulty level — for example, higher weights are assigned to “contested” vs “uncontested” possessions.
Categorising the Data
I’ve organised the scoring system into three categories — getting rid of the ball, getting the ball and extra value behaviours. In doing so, there is an additional assumption that an action cannot be “counted” twice in the scoring system — which we will come back to in later analysis.
Assumption #3 : The SuperCoach scoring system comprises uncorrelated variables (statistics).
A layman’s example of this rule is where kicks = effective kicks + ineffective kicks. The scoring system cannot include a weight for kicks (total) because it already includes a weight for each component of kicks (ie effective kicks and ineffective kicks). From this we can either use an aggregated or disaggregated approach depending on availability of data.
Category #1 Getting Rid of the Ball An obvious description, this category includes both proper and improper disposal of the ball.
- We get to drop ineffective kicks and ineffective handballs from our analysis because these two have zero weights in the SuperCoach scoring system.
- While we don’t have clanger kicks and clanger handballs in our dataset, we do have clanger (totals), and as this is the sum of the the unknowns AND the weights applied are the same, we can simplify our analysis to just assigning a weight to clangers of -4 points.
So this category is easiest to have a complete dataset for.
Category #2 Getting Rid of the Ball This category contains possession related statistics.
- The Champion Data glossary gives us the exact definition for ground ball gets, and applying the same logic as we did for clangers, we can drop the hard and loose ball gets from our model and replace it with ground ball gets with a weight of 4.5 points.
- We don’t have statistics for the sub-categories of contested and uncontested marks, however we do know that the weighted value is bounded. For the time being we will assume an average possession rate of 50% for both types and revisit this assumption at a later date.
- Handball receive and gather from hitouts — we know that the sum of these two stats must be equal to total possessions less ground ball gets less marks; given that the weightings are relatively similar, we can assume a 50/50 split between the two and revisit the assumption at a later date.
Category #3 Extra value behaviours This group of statistics rewards or penalises good/bad behaviours/actions respectively. Fortunately we have access to all the statistics so there’s no need to delve further.
The problem can be simplified as follows — in the words of Rumsfeld — which transforms it into a linear regression that we can start to solve.
While this has been a bit laborious, hopefully its an exercise I’ll only have to do once in this intended series. For completeness, I’m including the estimated possession rates (to own team) in my assumptions as something that we can revisit later on.
Assumption #4 : Assumes possession rate of 50% for each of contested and uncontested marks. Assumes 50/50 split between handball receive and gather from hitouts.
Linear Regression : Preamble
Going back to our initial quest, all of the above means that we can get a starting point in terms of understanding the quantum of scaling done for each player. To do this, we need to delve into the concept of scaling.
The thing that we do know about this form of scaling is that the intercept of our regression must be zero. This is known as a constrained regression. For those who need a refresher on the math — the intercept is the y-value (on the vertical axis) that corresponds to the x-value (the weighted stats on the horizontal axis) being a zero. Hence it is a pure scaling without adjustment upwards or downwards.
Figure 6 shows the difference between running a constrained and unconstrained linear regression over the data for R1 2024. Linear regressions typically solve for the smallest error term -
- unconstrained (red) error term is 9.78, which is smaller than the constrained (green) error term of 10.41. Constrained regressions will always have a higher error term than unconstrained regressions.
- unconstrained (red) intercept is 8.32 — which is intuitively incorrect
Why? In non-mathematical terms, the easiest way to explain this is using Harry Cunningham (Sydney Swans) in R2 as an example. Poor Harry was injured very early in the game, hence all of his statistics are zero which gives him a SuperCoach score of zero.
If we used an unconstrained equation — it implies that Harry’s score would estimated as being equal to the intercept ie 8.32 (round differences aside for now) — and the error term would be -8! This breaks a well known “rule” of SuperCoach scoring — if you didn’t do anything on the field you get a zero score — no questions asked.
Instead, we take comfort with using the constrained model because we know that we can attribute these errors to our unknown unknowns (including any unknown stats outside of the terms and conditions)— and then use these residual errors to give us a few more clues about the actual SuperCoach algorithm as a whole.
Forcing the linear regression to be unconstrained in the quest to reduce the mean absolute error in this type of problem is the equivalent of data mining with no understanding of basic principles of SuperCoach scoring. Hence we add to our list of assumptions.
Assumption # 5 : The intercept of the linear regression must pass through zero when determining the scaling factor.
Baseline Model
The other thing that we know is that the scores for all matches must add up to 3300. This leads us down the rabbit hole of exploring two possible theories -
- Theory #1 : all scores for a given match are scaled using the same factor.
- Theory #2 : all scores for a given team for a given match are scaled using the same factor; hence the two teams for a given match can have different scaling factors.
Theory #2 on its own or in combination with Theory #1 supports anecdotal observations (that I’ve read on social media) that scaling has some relationship with the match outcome ie margin of victory.
Now that we have listed everything that we know, we can build a baseline model based on the framework and assumptions outlined above. Our baseline model will calculate the scaling results at the round, match and team level.
Figure 7 above shows that there is some merit to the theory that different scaling factors could be used for each team/match combination — where the scaling factor is represented by the slope of the line.
- At the match level : scaling is 0.792 to 0.943 — a range of +/-10% around the overall round scaling of 0.856
- At the team level : scaling is 0.754 to 0.951 — a range of +/-12% around the overall round scaling of 0.856
The ranges are wide relative to round level scaling — what this means is that if you apply a round-level scaling factor to the overall dataset, this could imply errors in the order of 10% in the fitted estimates purely due to scaling factors, depending on whether the “real” SuperCoach algorithm scales on a match or team basis. For a player who has an actual score of 100, scaling differences can imply a fitted differences of 10 points, error terms aside.
The other thing we can observe is the concept of relative scaling, which can be expressed as the scaling used for one team vs the other in a given match, or for each team vs the match scaling factor. Intuitively this also means that even if the extra unknown stats are now known, the relative scaling factor should still be in the same order, assuming that the base known set of stats comprises the majority proportion of the SuperCoach score.
Error Analysis Intuitively, when you look at the simple model results -
- The errors generated by the match and team level scaling are uni-directional — suggesting the estimates that I’ve used for possession rates for contested/uncontested marks is “consistently off” in one direction. The notion of extra statistics to explain the differences is still on the table, but is not necessarily a definitive yes.
- Round level scaling as a shortcut proxy will also have greater variances from the aggregate “rule” of 3300 per match. This means that using the residual error terms from this model to infer the weightings to be applied to any extra statistics seems flawed by definition.
Next Steps Wow, that was a mouthful. Given the length of this article and the many questions that the above analysis now leads on to, it seems timely to stop here and continue this in Part 2.
From my perspective, the obvious next steps are
- revisiting assumption #4 to get a better feel for the sensitivity of model results to possession rates for contested and uncontested marks
- a deeper dive into the error terms generated by the baseline model for groups of players
- including the margin of victory and the scores for each team in the analysis
- going big data style and analysing the the factors collected above on a multi-round and multi-season basis — as well as the model stability over time
Stay tuned!