I have a complicated relationship with DFS. The thrill of possibly winning a large sum of money, the craftiness involved in creating a winning lineup, and the anxiety of creating the perfect lineup that will almost certainly fail. This is all balanced out by the countless hours of research, statistics, and dread that goes into creating this lineup. Each week, I use statistical methods to try and predict the upcoming NFL week – mostly used for DFS, but applicable to any fantasy situation. I post these results on my Twitter (@DFF_Koala) with varying amounts of success. This past week (Thanksgiving week) was an awesome week in my opinion, but I feel as if I did not properly communicate *how* to read the tool I posted. I am writing this article to do this. I want to properly communicate how the tool works.
First, let me explain the methodology behind my statistical methods. I have been using what is called the “buy-low” methodology. This method is dependent on regression towards the mean. In this sense, regression is a good thing. We want to identify players who have been playing below their predicted season averages and are going to “upswing” back to (or above) average. @friscojosh introduced me to this concept, and he does this using Airyards data each week. I recommend checking out how he applies this to receiving.
The second big thing is that I give WAY more information than you probably want, but it is needed. I predict every position and give multiple columns that can help identify special players in a week. Sometimes it isn’t as simple as, “who is going to regress upwards?” Sometimes we need more information to identify these special players.
Lastly, nothing in DFS is certain, and predictions *WILL* be wrong; this method is in no way perfect. Okay… let’s talk statistics…
I use two main methods: Logistic Regression and K-Means Clustering. Logistic Regression is a form of regression used to predict “categorical” variables. Variables like “Yes” or “No” are considered categorical. This version of categorical regression predicts the *probability* of the two options. If the probability is above .5, it is a “Yes.” If it is below .5, it is a “No” (for me dawg). “Yes” and “No” could be anything in this example; they could be “Good” or “Bad,” “Black” or “White,” “Cat” or “Dog” etc.… the list is infinite.
The next method I use is called K-Means Clustering (KMC). I use KMC to find a player’s distribution. I am not going to get into the math of KMC, but I do recommend looking it up. I use KMC to find sufficiently large groups (usually 3 groups) that I can place a player within. Most of the time, the groups are tiered… A is better than B is better than C. A, B, and C could be 1, 2, and 3.
It is just a label to identify the cluster that has no meaning. Sometimes they overlap, but almost 100% of the time a single cluster will arise as the “best” choice. Players in this cluster are “predisposed” to more fantasy points than the others. If possible, we want to choose players in this top cluster because they usually have the top scoring players within a week. It is crucial to note, a player from the worst cluster *CAN* outperform a player in the best cluster, but as a whole, the best cluster will almost always outperform the worst cluster.
There are a few key variables to look at and interpret:
Averaged Predicted _______: There are a few of these variables on each position. This is a player’s *Predicted* Average of a certain variable based on 5-7 other variables. An example of this would be Receiving Yards. To predict average Receiving Yards, we could use a player’s 3-week moving average for: RecYards, Targets, Receptions, AirYards, AirYards per Target, etc.… so we can reliably predict this average using 3-week moving averages of other variables. We predict this variable using multiple linear regression.
_______ Diff: These are the difference between a player’s *actual* 3-week moving average in the desired variable and their predicted season average. These variables are the gold mine we are looking for. They can help us identify players that are underperforming their expectation. The more negative this variable, the more a player is underperforming. Players will attempt to “regress” towards their expectation, but this does not guarantee they will. For some players, this number could be extremely negative, but that could just mean they are declining.
Probability of Good: This is obtained by using logistic regression (explained above) on whether a player meets a certain threshold. I usually set the threshold at the season average for FP at a position. I may change it for some arbitrary reason. RBs this week were set at 15 to see if it helps us identify players better.
Clusters: Clusters help us identify a player’s “tier” or “distribution.” Certain clusters will outperform others. This means that over the long run, players in a better cluster will score more fantasy points than players in a worse cluster. A better cluster just means a player is predisposed to scoring more fantasy points but does not rule a single player out from being good. In the graph below, we can see this trend. The green is consistently better than red which is better than blue. Sometimes this does not *always* happen like in week 11.
What to look for:
So how do you read it? Well, it depends on what your play style is and if you are playing Cash or GPP. Of course, keep your usual strategy, but this could help you identify special players!
GPPs: In GPPs you want players with decent probability and strong negative FP Diff. I sorted this table by Probability*FP Diff. Notice the names that stand out. Notice their clusters. Cooper, Davis, Baldwin, and Juju all stood out before the week, and ¾ stood out with their actual performances. That is one way I use the tool to identify strong GPP plays. These are players that have been underperforming but are out of place in regards to clusters. If you want to try and hit it big, you could look at the FP Diff Column by itself and look for low-owned players that are underperforming. These are risky options though; probability is a good proxy for the risk a player may have.
Cash: I usually have a few players I like, and I use the probabilities as a guide to who is has a better chance to hit their floor. Sometimes, these players will even explode.
This is a *tool* it does not merely generate answers, but when used correctly this can identify special players. Just as an idea of its success on Thanksgiving, these were players I identified as players in my main player pool and their FanDuel points both successfully and unsuccessfully:
- Ben Roethlisberger – 22.3
- Baker Mayfield – 25.9
- Drew Brees – 21.4
- Andrew Luck – 25.1
- Philip Rivers – 20.7
- Christian McCaffrey – 41.2
- Leonard Fournette – 24.3
- Ezekiel Elliott – 22.8
- Saquon Barkley – 29.7
- James Conner – 9.5
- Marlon Mack – 10.6
- Phillip Lindsay – 17
- Golden Tate – 5
- Dede Westbrook – 16.2
- Stefon Diggs – 18.9
- Desean Jackson – 4
- Corey Davis – 21.5
- Amari Cooper – 34
- Juju Smith-Schuster – 31.4
Those were all players that I used or were high on after viewing the results of my model and clusters. I cannot promise results like this all the time, but historically, this model and the cluster formulas have performed very well.
I post a weekly thread with all of this information on Twitter @DFF_Koala on Saturdays, so if this interests you, be sure to check out my Twitter Saturday morning and follow me! I also have a Patreon (Patreon.com/KoalatyStats) where I send out the Excel file to all of my Patrons. Becoming a Patron is $1 a month. 🙂 I have big plans for future content so consider supporting me! I have a Bachelor of Science in Statistics from the University of Georgia and love applying my knowledge to football. I hope this article better breaks down exactly how to use my tool, and I hope it helps you win all the DFS. Nothing is guaranteed, though, and sometimes you’ve got to trust your gut. Thank you for reading this!