Stat Review: Completion Percentage Plus
Author: Billy Jones
Introduction
Welcome back to my blog about football
analytics (this is my first non-fantasy football blog). In
this post, I'll be exploring a topic that I find incredibly fascinating, a
statistical analysis of a sports statistic. Specifically, I'll be discussing
how we can use regression and R-squared values to review a commonly used
football statistic, completion percentage, and hopefully create a better
statistic to predict player performance.
So where did this come from? My favorite book
of all time is “Moneyball: The Art of Winning an Unfair Game” by Michael Lewis
and in the book, Billy Beane and the Oakland Athletics used on-base percentage as
a competitive advantage over their competitor as on-base percentage had shown to
be a more accurate measure of a player's offensive contributions than batting
average. On-base percentage takes into account not only how often a player gets
a hit, but also how often they get on base in other ways, such as walks or hit
by pitches. So, I wonder if this could be applied to quarterback’s completion
percentage and adding back things that weren’t their fault… drops. That’s where
I came up with my new stat called Completion Percentage Plus, which adds back
those drops as the quarterback did their job and got the receiver the ball.
So, if you're a fantasy football enthusiast
looking for a new way to evaluate player performance, or someone interested in
statistical analysis in sports, or anywhere in between, read on to see if my
adjusted metric has value or is just a bit of meaningless nonsense!
Statistics Refresher
Regression is a statistical method that helps
us to understand the relationship between two or more variables. In football,
we might use regression to look at how one variable (such as completion
percentage) is related to another variable (such as yards per attempt or
touchdown rate) or we may use regression to determine how predictable (or “sticky”)
a statistic is year over year. We will be doing the later today.
R-squared is a figure that comes out of a regression
analysis that helps us to understand how well the predicting variable (prior year)
fits the actual data (current year). R-squared value ranges from 0 to 1, with a
value of 1 indicating a perfect fit and a value of 0 indicating no relationship
at all. A perfect fit would be highly predictive while a 0 fit indicates the statistic
is not predictive year over year.
Ground Rules
Before I jump back into the analytics, I would
like to set some ground rules for the readers. The data used for this analysis
was obtained from Pro-Football-Reference for 2018 through 2022. Also, to be included in the regression analysis the
quarterback had to have 200 passing attempts in N and N-1 years. This will help
us remove some small attempt volume (small sample size) anomalies.
Visualizations and Analysis
As mentioned above, we will be using data from 2018 through 2022 for our analysis. This results in 4 regression periods and the sample size for this population is 104 quarterbacks. 1) 2018 (N-1) vs 2019 (N), 2) 2019 vs 2020, 3) 2020 vs 2021, and 4) 2021 vs 2022. I note that in a regression analysis a larger sample size will generate better results so 104 feels pretty small but I also elected to not go too far in the past as I understand that the NFL is an ever-evolving league. With that stated, let’s take a look at a few regressions.
Analysis: From a year over year predictiveness perspective, completion percentage appears to be a more predictive statistic than completion percentage plus. The technical way to articulate this is… The proportion of the variance in the completion percentage plus N year (the variable being predicted) that is explained by the completion percentage plus N-1 year (the variables used to make the prediction) is 0.227 whereas the proportion of the variance in the completion percentage N year that is explained by the completion percentage plus N-1 year is 0.271… The less technical way to say this is… completion percentage is a stickier/more consistent statistic than completion percentage plus. Adding back the "random" drops didn't improve the predictiveness of the statistic. Seeing the statistic “fail” is disappointing but this is partially expected when looking for insights in data.
As football analytics folks know, small sample size is a real issue when working with football data. The volume of events is much lower than baseball or basketball and therefore it is important to have a robust enough dataset to find insights. As such, I wanted to show “what could have gone wrong” if I had only used 2021 vs 2022 in my analysis.
Analysis: From a year over year predictiveness perspective, completion percentage appears to be a LESS predictive statistic than completion percentage plus. Yup that is right, the results flipped for 2021 vs 2022. The proportion of the variance in the completion percentage plus N year (the variable being predicted) that is explained by the completion percentage plus N-1 year (the variables used to make the prediction) is 0.3437 whereas the proportion of the variance in the completion percentage N year that is explained by the completion percentage plus N-1 year is 0.3358…
1) Small sample
size issues causing fluky results.
2) The statistic of dropped passes is flawed. -> The NFL counts dropped passes as any pass that is catchable and is not caught by the intended receiver, whether it hits their hands or not. The statistic is tracked by official scorekeepers at each game and is based on objective criteria. However, there is some subjectivity involved in determining whether a pass is considered catchable. For example, a pass that is thrown behind a receiver or is poorly thrown may not be considered catchable, even if it technically touches the receiver's hands.
3) The statistic of dropped passes was flawed but now isn’t. -> As the age of data is booming right now, I have a theory that a stat like dropped passes was poorly recorded in the past and the historical data is not truly an accurate depiction of what happened. Without rewatching every football game (my fiancĂ© would not be happy if I did this) during the analysis period I’m stuck with the data I have but I will most definitely be keen on reperforming this analysis after the upcoming season.
Bonus visual: The statistical
value of completion percentage plus seems inconclusive but here is Completion percentage
vs Completion percentage plus for 2022 for those that are interested.
In conclusion, my exploration of “Completion Percentage Plus” has shown that the statistic did not result in a more meaningful statistic than standard Completion Percentage. However, this doesn't mean that the effort was a failure. In data science, it's important to embrace the unknown and explore new avenues of inquiry. Even when our hypotheses don't bear fruit, we can learn a great deal from the process. Moving forward, I look forward to conducting many more of these statistical reviews of football statistic. Who knows? I may stumble upon a new metric that proves to be a game-changer. Until then, I look forward to continuing my exploration of football analytics.
Addendum
I finished up writing this blog in the Loon Mountain Ski Lodge and I couldn't help but think back to the last time I was at a ski lodge - when I was skiing with my cousins and hit a tree. While I won't go into details here, I just wanted to shout out my Wise family cousins who were there. It's moments like these that remind us of the importance of family, and I'm grateful to have all of you in my life.
*This blog post was enabled by ChatGPT. The text was generated by me, and the content is my own, but some sentences and wording were provided by the model. I take full responsibility for all information produced in this blog. More information about OpenAI and their technology can be found at https://openai.com.*
Comments
Post a Comment