# How AFL Fantasy Pricing Works (Part 3)

## Understanding how initial player prices are determined using classification and regression algorithms

One of the problems in early preparation of my AFL fantasy team for the upcoming season is that player fantasy prices are not available on the Footywire website until season commencement.

In investigating the methodology for the pricing calculation, we find that the pricing specifications are slightly incomplete as they can only be found as separate articles when researching the web. For most seasons, the pricing methodology was available for mature players however there was only one which was available for rookie players.

The purpose of this article is to deduce the pricing methodology used by the AFL fantasy gods from the data. The source of truth was the starting team lists for each season, from this we scraped the individual player statistics and prices for each season. The data was scraped from the Footywire website for players for each of the 2018 to 2019 seasons.

**Our objective** is to use the player data to build a model that can predict the player group and interpret the results to find the factors which influences the price of each player cluster.

The data includes the initial season price, which makes this a supervised regression machine learning task:

- Supervised: we have access to the features and the target price and our goal is to train a model that can learn a mapping between the two
- Regression: The initial player price is a continuous variable

We want to develop a model that is both accurate — it can predict the player group close to the true value — and interpretable — we can understand the model predictions.

Reading the specifications, we see that the initial starting price equals the players scores multiplied by a factor, the magic number, and then adjusted by a discount if the players have not played the full previous season. We initially identify** six **player groups which we need to solve for.

**Data**

The initial data set used is from the 2018 season — it excludes player identifying characteristics such as player name and team.

- AFprev1, AFprev2, AFlast — are the average scores for the previous, prior and 3rd most recent season for a given player
- nprev1, nprev2, nlast — are the number of games for the previous, prior and 3rd most recent season for a given player
- DraftYear, DraftPick, DraftType — describes the draft type of all players — both current mature and rookie players

**Classification Problem**

In order to classify the players, we created new features by transforming some of the data.

*nprev1*which was cut off at 10 games for the previous season. This transformation allows us to distinguish group 1 players.*nprev2*which can be 1 or 0 depending on whether the player played in the prior season. This transformation allows us to distinguish between group 2 and group 3 players.*DraftYear*which is the number of years between the year the player was drafted and the current season. This transformation allows us to distinguish between rookie and mature players.*DraftType*is a categorical data type which can be one of National Draft, Pre-Season Draft, Rookie Draft or Mini Draft. This column was cleaned up to include a None type for missing or unavailable data.

To set up the classification problem as a decision tree, we first roughly interpret the rules and calculate prices for each player for each of the six possible equations. We then assume that the players are classified according to the pricing group that produces the least absolute error.

Due to some crossover, our original data set of 804 players becomes a new data set of 1133 observations, where players may fall into more than one group — the decision tree predictions assign the player to the highest probability class by feature similarity.

The results above are pleasing — we now have explicit classification definitions for each player group. Specifically, the algorithm has defined the difference between rookie and mature players — where rookies have played less than 2 seasons (relative to the current season).

- At the top of the tree, we start out with only one observation for group 4 players and this gets “lost” in the classification algorithm result at the bottom of the tree as there is insufficient data to predict player grouping.
- Players in group 3 by definition can fall into group 1 or 2 depending on whether they played the most recent season; the results in the bottom row of the algorithm show the crossover between the groups.

**Regression Problem**

From the classification performed above we know which base price best corresponds to each cluster group. From this we can let the regression results figure out the appropriate discount factor for each group — the draft pick number for the rookies and the season number of games for mature players.

As the price adjustment term is non-linear, we add a new feature which is the (1) for mature players — base price multiplied by the number of games or (2) for rookies — draft pick number multiplied by the draft premium.

The regressions are run using the base price and the price adjustments as inputs and the initial price as an output.

We see that the regressions yield a good fit for the data, with very high r-squared for each of the clusters. Additionally the results validate the pricing parameters discussed in the research articles.

## Interpretation and Conclusions

Importantly, the regressions (with a few differences due to rounding) validate the initial descriptions given by the articles, where our numbers and comments are in the square brackets.

- Group 1 : Players are priced based on their output last year. eg. 2017 average x magic number
- Group 2 : Players who missed a full season receive a discount of 30% [29%] on their 2016 average.
- Group 3 : Players who have missed two or more seasons received a discount of 35% on their last available average.
- Group 4 : Players playing less than 10 games receive a 3% [2.86%] discount on each game under 10. eg. Play 5 games in 2016, then they receive a 15% discount. This is applied to the higher of their season average from the last two years.
- Group 5 : [National] Draft players are priced at the minimum season price plus a premium for each pick number below 50. [Draft players are those who have played less than 2 seasons].
- Group 6 : All other rookie players are priced at the minimum season price. [These are players drafted in the mid-year draft, rookie draft etc].

Analysis of the 2019 season confirms the classification results with similar regression coefficients. With the framework above, we now have the ability to compare players and select our preliminary team ahead of the start of the new 2020 fantasy season.

**Addendum** : As part of this analysis, we have developed a simple app which allows users to explore 2021 preseason pricing for all players.

## Other Articles in this Series

**References**

(1) Footywire stats for teams by season (link)

(2) Footywire fantasy scores and prices for individual players (link)

(3) 2014 AFL Fantasy Pricing — What We Know by dreamteamtalk (link)

(4) DT Live’s Drawing Board 2018 by dreamteamtalk (link)

(5) Drawing Board 2019 by dreamteamtalk (link)

(6) Drawing Board 2020 by dreamteamtalk (link)