We're obsessed with grouping things together. We self-select each other into groups based on which political candidate we support, which sports team we root for, and which arbitrary country we're born in. People also spend hours on the internet arguing over "tiers", or groupings, of their favorite athletes and sports teams. For example, which NBA players are "elite" vs. "great" vs. just "good"? Did Carmelo Anthony belong on the Banana Boat? When engaging in these arguments, we typically use statistics like points or rebounds per game to back up our points, but at the end of the day, the groups are more or less kind of arbitrary.
But what if there was a way to algorithmically sort observations into groups based on shared characteristics using machine learning methods? Enter clustering, which is the methodology of grouping similar observations into groups, or "clusters", using a mathematical distance metric derived from a set of shared characteristics (in our NBA example, we can think of these characteristics as being statistics like points per game, rebounds per game, and assists per game). We can think of a lot of applications for this, but this article focuses on NFL Quarterbacks.
NFL Quarterback Archetypes from 2015-2020
The argument over NFL Quarterback Archetypes has existed as long as the forward pass itself. Where do we draw the line between a pocket passer and a mobile quarterback? Of course, racial biases always seep into our perceptions of NFL quarterbacks as well. Black quarterbacks are always perceived as more mobile and less good at passing, even when that may not be the case -- while the reverse is true for White quarterbacks. Did you know that Aaron Rodgers and Patrick Mahomes average roughly the same rushing yards per year over their careers? Blake Bortles rushed for more than 100 more yards per year than Dak Prescott from 2015-2020.
Clustering allows us to take an agnostic approach to this problem. I analyze all NFL Quarterbacks who had 3 or more 2,500 yard seasons from 2015-2020 (plus Lamar Jackson) and consider the following statistics (each is an average from 2015-2020):
- Passing Yards
- Completion Rate
- Average yards per throw
- Touchdowns
- Interceptions
- Longest completion
- Sacks
- Rushing Yards
- Yards per carry
- Longest rush
The clustering algorithm will consider these statistics and group players who are similar across these variables together. For example, players with good completion percentages and high passing yards per years (i.e. good passers) will tend to be grouped together, while bad passers will tend to be grouped together. The underlying data can be found below:
Results
The results were remarkably on point in that they were pretty much what you might expect, with maybe a surprise here or there depending on your familiarity with the NFL landscape. The clustering method produced 4 clearly defined Archetypes, which can be described easily by their Passing and Rushing characteristics.
Archetype 1: Hybrid Quarterbacks
Group average passer rating: 102.84
Group average rushing yards: 285.6
Group average YPC: 5.1
Quarterback members:
- Aaron Rodgers
- Dak Prescott
- Deshaun Watson
- Patrick Mahomes
- Russell Wilson
- Ryan Tannehill
Archetype 2: Pocket Passers
Group average passer rating: 95.2
Group average rushing yards: 39.8
Group average YPC: 1.6
Quarterback members:
- Ben Roethlisberger
- Drew Brees
- Eli Manning
- Jared Goff
- Kirk Cousins
- Matt Ryan
- Matthew Stafford
- Philip Rivers
- Tom Brady
Archetype 3: Elite Mobile Threats
Group average passer rating: 92.7
Group average rushing yards: 525
Group average YPC: 5.6
Quarterback members:
- Cam Newton
- Josh Allen
- Lamar Jackson
- Marcus Mariota
- Tyrod Taylor
Archetype 4: Mediocre Passers who Can Run when needed
Group average passer rating: 88.4
Group average rushing yards: 177.2
Group average YPC: 3.9
Quarterback members:
- Andy Dalton
- Baker Mayfield
- Blake Bortles
- Carson Wentz
- Derek Carr
- Jameis Winston
- Joe Flacco
- Ryan Fitzpatrick
Conclusion
There are some caveats in this analysis that I'll mention here. It only considers Quarterback seasons from 2015-2020, so for veteran quarterbacks like Ben Roethlisberger, it doesn't consider their early seasons. I only considered seasons where a quarterback had more than 2,500 yards, meaning that I discarded all "injury years", which prevented them from weighing down the averages that were inputted into the model.
That being said, I think the results were pretty neat. We pretty much got the expected results for most of the quarterbacks in the group. The elite, mobile "hybrid" quarterbacks were grouped together, while the upper tier pocket passers were mostly grouped together. The data shows that Aaron Rodgers belongs in the "mobile/hybrid" category, not the "pocket passer" category. The data also suggests that Lamar Jackson shares more characteristics with the run-first quarterbacks despite his high passer rating. Which might irritate some Ravens fans. Which I'm fine with. One thing that I also want to point out is that the clustering method sorts on commonalities in the input variables (passing stats, rushing stats). Which means that Ryan Tannehill has been placed in the company of some pretty elite quarterbacks. He's not often in the discussion of the NFL's top quarterbacks, and is often treated as more of a game manager, but the data suggests he should be treated with a little more respect -- he shares characteristics with the likes of Aaron Rodgers and Patrick Mahomes.
Thanks for reading! If you like the content, be sure to check out my other articles. And don't forget to:
Comments
Post a Comment