Skip to main content

Determining NFL Quarterback Archetypes (with stats!)

We're obsessed with grouping things together. We self-select each other into groups based on which political candidate we support, which sports team we root for, and which arbitrary country we're born in. People also spend hours on the internet arguing over "tiers", or groupings, of their favorite athletes and sports teams. For example, which NBA players are "elite" vs. "great" vs. just "good"? Did Carmelo Anthony belong  on the Banana Boat? When engaging in these arguments, we typically use statistics like points or rebounds per game to back up our points, but at the end of the day, the groups are more or less kind of arbitrary. 

But what if there was a way to algorithmically sort observations into groups based on shared characteristics using machine learning methods? Enter clustering, which is the methodology of grouping similar observations into groups, or "clusters", using a mathematical distance metric derived from a set of shared characteristics (in our NBA example, we can think of these characteristics as being statistics like points per game, rebounds per game, and assists per game). We can think of a lot of applications for this, but this article focuses on NFL Quarterbacks. 

NFL Quarterback Archetypes from 2015-2020

The argument over NFL Quarterback Archetypes has existed as long as the forward pass itself. Where do we draw the line between a pocket passer and a mobile quarterback? Of course, racial biases always seep into our perceptions of NFL quarterbacks as well. Black quarterbacks are always perceived as more mobile and less good at passing, even when that may not be the case -- while the reverse is true for White quarterbacks. Did you know that Aaron Rodgers and Patrick Mahomes average roughly the same rushing yards per year over their careers? Blake Bortles rushed for more than 100 more yards per year than Dak Prescott from 2015-2020. 

Clustering allows us to take an agnostic approach to this problem. I analyze all NFL Quarterbacks who had 3 or more 2,500 yard seasons from 2015-2020 (plus Lamar Jackson) and consider the following statistics (each is an average from 2015-2020): 
  • Passing Yards
  • Completion Rate
  • Average yards per throw
  • Touchdowns
  • Interceptions
  • Longest completion
  • Sacks
  • Rushing Yards
  • Yards per carry
  • Longest rush
The clustering algorithm will consider these statistics and group players who are similar across these variables together. For example, players with good completion percentages and high passing yards per years (i.e. good passers) will tend to be grouped together, while bad passers will tend to be grouped together. The underlying data can be found below: 


Results

The results were remarkably on point in that they were pretty much what you might expect, with maybe a surprise here or there depending on your familiarity with the NFL landscape. The clustering method produced 4 clearly defined Archetypes, which can be described easily by their Passing and Rushing characteristics. 

Archetype 1: Hybrid Quarterbacks

Group average passer rating: 102.84
Group average rushing yards: 285.6
Group average YPC: 5.1
 
Quarterback members:
  • Aaron Rodgers
  • Dak Prescott
  • Deshaun Watson
  • Patrick Mahomes
  • Russell Wilson
  • Ryan Tannehill

Archetype 2: Pocket Passers

Group average passer rating: 95.2
Group average rushing yards: 39.8
Group average YPC: 1.6
 
Quarterback members:
  • Ben Roethlisberger
  • Drew Brees
  • Eli Manning
  • Jared Goff
  • Kirk Cousins
  • Matt Ryan
  • Matthew Stafford
  • Philip Rivers
  • Tom Brady

Archetype 3: Elite Mobile Threats

Group average passer rating: 92.7
Group average rushing yards: 525
Group average YPC: 5.6
 
Quarterback members:
  • Cam Newton
  • Josh Allen
  • Lamar Jackson
  • Marcus Mariota
  • Tyrod Taylor

Archetype 4: Mediocre Passers who Can Run when needed

Group average passer rating: 88.4
Group average rushing yards: 177.2
Group average YPC: 3.9
 
Quarterback members:
  • Andy Dalton
  • Baker Mayfield
  • Blake Bortles
  • Carson Wentz
  • Derek Carr
  • Jameis Winston
  • Joe Flacco
  • Ryan Fitzpatrick

Conclusion

There are some caveats in this analysis that I'll mention here. It only considers Quarterback seasons from 2015-2020, so for veteran quarterbacks like Ben Roethlisberger, it doesn't consider their early seasons. I only considered seasons where a quarterback had more than 2,500 yards, meaning that I discarded all "injury years", which prevented them from weighing down the averages that were inputted into the model. 

That being said, I think the results were pretty neat. We pretty much got the expected results for most of the quarterbacks in the group. The elite, mobile "hybrid" quarterbacks were grouped together, while the upper tier pocket passers were mostly grouped together. The data shows that Aaron Rodgers belongs in the "mobile/hybrid" category, not the "pocket passer" category. The data also suggests that Lamar Jackson shares more characteristics with the run-first quarterbacks despite his high passer rating. Which might irritate some Ravens fans. Which I'm fine with. One thing that I also want to point out is that the clustering method sorts on commonalities in the input variables (passing stats, rushing stats). Which means that Ryan Tannehill has been placed in the company of some pretty elite quarterbacks. He's not often in the discussion of the NFL's top quarterbacks, and is often treated as more of a game manager, but the data suggests he should be treated with a little more respect -- he shares characteristics with the likes of Aaron Rodgers and Patrick Mahomes. 

Thanks for reading! If you like the content, be sure to check out my other articles. And don't forget to:

Subscribe

* indicates required

Comments

Popular posts from this blog

Analyzing Hip Hop - Who's Most Lyrical, What Determines Popularity, and More

Have you ever thought about bringing cold, hard statistics to one of life's greatest artistic joys? Well fear not, because in our increasingly data-driven world, our analyst friends are hard at work attempting to statistisize (numerize?) everything you can think of, so we can analyze and therefore optimize it. One of the art realms that is increasingly falling under the purview of data science is music. We all benefit from it in the form of curated daily Spotify playlists and Pandora stations that allow us to find new artists and songs.  I was recently able to get my hands on a Spotify dataset  that contains data on over 160k tracks dating from 1921 through December 2020. Aside from containing some basic features like track name, duration, and release date, it also contains some advanced metrics as calculated by Spotify like "track positivity" (is it a sad, depressed song, or a happy, positive song?), "danceability", "energy", "speechiness" (

Why isn't Robinhood letting me trade? (hint: there's probably not a conspiracy against you)

Today's been a big day in the stock market . Lots of people have lost a lot of money, and a lot of people are understandably really upset . Here's a quick breakdown of what's happened so far A subreddit called /r/wallstreetbets  (visit at your own peril), which has exploded in popularity recently and has over 5 million subscribers (and counting) got really excited about three stocks: GME (Gamestop), AMC (the movie theater place), and BB (Blackberry). Gamestop was the main stock.  Yes, I know all three companies are doing terribly in the real world. I won't go into why they got excited about the stocks here.  They convinced a lot of other people to buy the stocks and they did well. Really well. Take a look at their Yahoo Finance pages and look at their 1 month price charts (then ignore the past two days). GME , BB , AMC Everyone got in on it, and I mean it. When a lot of people buy a single stock, the price rises. It turns out, this was hurting a lot of Hedge Funds and I