Skip to main content

Coronavirus Strikes Back (ft. Delta)

It seems like it's been an eternity since the mad rush to get vaccinated in the United States. Some of us took desperate measures to skip ahead in line to get the first dose. Initially, the results were astounding -- in just approximately four months, 100 million Americans (~1/3rd of the population) had received their first dose. There was an unmistakable effect on the spread of COVID-19. Daily new cases dropped from a peak of roughly 250,000 in January to just 10,000 a day in July. 

Then, Something Happened

What happened can best be illustrated using a chart (duh, this is Stats with Sasa). 


Just after it seemed we had gotten a handle on Coronavirus, it came back with a vengeance. Even though more Americans are fully vaccinated than ever, COVID is currently spreading twice as fast as last year's Second Wave. Although there were a few potential explanations, a clear one emerged: the virus had mutated. 

Enter the Delta Variant

Originating in India, the Delta strain of the Coronavirus has posed an imminent threat to the United States since its arrival on our shores. According to the CDC, it is more contagious, it may be more deadly, and perhaps most importantly: it may pose a threat to vaccinated individuals. Furthermore, while previous strains largely spared adolescents, the Delta variant seems to have the capability to infect children. 

From a scientific standpoint, more "successful" variants/mutations of a virus tend to dominate over time -- think of it as evolution. Since early May, the Delta variant has gone from composing less than 5% of COVID-19 cases in the U.S. to approximately 98% of them. In the same timeframe, daily COVID-19 cases doubled from 50,000 cases a day to over 100,000 cases a day. 

It's easy to blame the unvaccinated community for the resurgence, and feel safe in your own vaccination status. After all, just 7 Southern states accounting for 20% of the country's population have accounted for about 50% of the COVID-19 cases over the past two months. These same 7 states have vaccination rates that are seven percentage points lower than the remaining 43 states. 

(Fun fact: Florida alone accounts for almost 20% of the past two month's COVID-19 cases despite only accounting for 6% of the country's population). 


Breaking Through the Vaccine

The reports (and states themselves) have stressed that breakthrough cases, where fully vaccinated individuals get COVID-19, are rare. The linked report by the Kaiser Family Foundation, which has been disseminated by high profile news outlets like Vox and politicians like Hillary Clinton, shows that 1.3% of confirmed cases were "breakthrough" cases, suggesting that vaccinated individuals are relatively safe from catching COVID-19. However, taking a closer look at the report shows that it is rife with flaws. For one, the average "observation period" for the 24 states that gave breakthrough figures began on January 1st, 2021. For those of us that remember, the vaccine rollout had barely begun by then. In fact, the first full vaccinations were not completed until January 25th. We did not hit the 10% mark for full vaccinations until March 10th -- more than three months into the report's "observation period." As a result, during the Kaiser Report's "observation period", there simply wasn't a chance for breakthrough cases to occur because very few people were actually fully vaccinated. 

It's easier to illustrate this flaw with an example. Say I develop an online videogame, but it's in closed offline beta for the first 20 days of the month. On the 21st day of the month, I release it online, and the servers are down for 2 of the remaining 10 days. I report to investors that the servers are relatively stable because they're only down 2 out of 30 (6.7%) days. Is that a reliable figure? They've only really been tested for 10 days. This is somewhat analogous to what the states (and by extension, Kaiser) have been reporting. 

There are a couple of ways to correct for this mistake. Most easily, we can remove the days where there were no fully vaccinated people from each state's "observation period." If we do this, the percentage of COVID cases that involve fully vaccinated individuals jumps from 1.3% to 1.6%.

Even then, once the days with no fully vaccinated people have been removed, 20% of the "observation days" take place when less than 5% of a particular state's population is fully vaccinated. In contrast, almost 50% of confirmed cases take place during these days. Again, it seems like a large portion of these "observation days" occurred when the pool of fully vaccinated people were virtually nonexistent. 

To hearken back to my videogame example, suppose that on days 21-24, the servers are online, but very few people are playing it because no one has heard of the game yet. Then a famous person tweets about the game, and on days 25-30 it surges in popularity, and the servers go down under stress in the last 2 days. Again - are all days equally comparable? 

To account for this, we can conduct a little thought exercise where we distribute the "breakthrough" cases based on the size of the pool of fully vaccinated people on any given day during the "observation period." Then, for that given day, given an estimated number of "breakthrough cases" and confirmed COVID cases, we can calculate an estimated "breakthrough rate" for that day. This methodology has a number of advantages: 
  • It allows us to account for days where there was a small pool of fully vaccinated people
  • It allows us to look at a time trend of the "breakthrough rate" and see what the "true rate" is approaching over time as we approach high vaccination rates
  • It holds constant the likelihood that a fully vaccinated person will get COVID-19 on any given day (this can be adjusted)
The following chart gives the results of this thought exercise/simulation. Reminder that no extra breakthrough cases have been added here -- they have just been assigned to days throughout the observation periods in order to contextualize how the number of fully vaccinated people has changed over time. 

The simulation passes the smell test in a lot of ways. For one, the median percentage of breakthroughs remains below 0.5% through March and 1% through April. This makes sense since at the time, the pool of fully vaccinated people was very small and the number of daily confirmed cases was high. Over time, as the pool of vaccinated people rises, so does the percentage of breakthroughs. The chart suggests that the percentage of COVID-19 cases where the individual is fully vaccinated is currently somewhere around 2.5% - 7.5%, not 1.3% as initially suggested by the Kaiser Family Foundation Report. Now this is a very rough exercise. It relies on a series of assumptions, but at the end of the day, I don't think they're overly strict. It's highly unlikely the breakthrough rate was ever in the 20s or above as suggested by the simulation in June and July, but it does provide compelling evidence the true "breakthrough rate" is substantially higher than the 1.6% or so as suggested by the Kaiser Report. In fact, it's likely that that the "vaccine breakthroughs" that the simulation suggests happened in June and July (that lead to the high breakthrough rates in those months) likely actually happened more recently due to the Delta Variant, suggesting that the converging breakthrough rate might actually be higher. 

Anecdotal evidence may support my conclusion. In one of the few mass COVID-19 outbreaks that has been thoroughly studied by the CDC recently, 469 individuals who caught Coronavirus during a July 4th weekend celebration in the lovely town of Provincetown, Massachusetts were studied. For my Boston friends familiar with Provincetown, it's hardly a conservative bastion that would be resistant to vaccination - in fact there's a MassLive article on how Provincetown managed to achieve a 114% vaccination rate.  As you may expect given this, when the outbreak was examined, the CDC found that about 75% of the Coronavirus cases were among vaccinated individuals. This anecdotal outbreak suggests the potential for a higher breakthrough rate than suggested.

The Rich get Richer

Beyond our simulation, we can examine real world data on vaccine rates and COVID-19 cases as the Delta Variant has spread to see if the vaccines have withstood the test of the mutated virus. Specifically, we can use the time-tested tool of linear regression to examine the relationship between vaccine rates and COVID-19 spread. A linear regression will tell us which direction the relationship is going (is a higher vaccine rate associated with a lower COVID-19 spread?) and whether that relationship is statistically significant. Furthermore, we can make use of linear regressions over time as the Delta Variant spread to see how this relationship changed over time.

Given our thought experiment, one might expect the relationship between vaccine rate and COVID spread to be weakening over time as the Delta Variant has been spreading. However, the chart below gives the results of the regression experiment: 


In this chart, the blue dots represent the coefficients from the regression of vaccine rate on COVID spread. All of the coefficients are statistically significant. Given the units of the regression, a coefficient of -0.001 indicates that a 1 percentage point increase in vaccination rate is associated with a 0.001 decrease in new COVID-19 cases per thousand people per day. 

At first, there is a small, but statistically significant positive relationship between vaccine rate and COVID spread. However, as the Delta Variant became more dominant, the gap between highly vaccinated counties and poorly vaccinated counties grew wide -- and there quickly became a strong association between vaccine rate and COVID spread in that a higher vaccine rate was associated with fewer daily COVID cases. This brings us to the present day, where areas like the South -- which are notoriously undervaccinated -- are collapsing under the strain of the Delta Variant, while more highly vaccinated areas fare much better. 

I'll wrap with a final chart of a recent plot of vaccine rate and (log) daily COVID cases among counties, which shows the stark relationship. 


Final Thoughts

This was a really challenging article to write -- I hope you enjoyed it. I believe that the vaccines are more susceptible to the Delta Variant than is common knowledge. While it's true that the majority of vaccinated COVID-19 cases are mild or asymptomatic, we still don't even know the long term effects of these seemingly benign cases - CAT scans have shown lung and heart abnormalities even in patients that have had asymptomatic COVID. Given this, it seems appropriate to start taking reasonable precautions again even if vaccinated. 

For those of you who are not yet vaccinated, get vaccinated. The Delta Variant of the virus isn't playing around, and it's ravaging a lot of communities. Experts say a key reason these deadly mutations are emerging is because people aren't getting vaccinated

For those who want the code to the simulation or want to discuss the methodology further, feel free to DM me at @statswithsasa. Feel free to follow me on Twitter or, subscribe below to receive an email whenever I post a new article!


And now to play us out, someone I've been bumping a lot lately: Toasty Digital. He's a maestro at remixing Kanye, and he's definitely worth checking out.

Subscribe

* indicates required

Comments

Popular posts from this blog

Why isn't Robinhood letting me trade? (hint: there's probably not a conspiracy against you)

Today's been a big day in the stock market . Lots of people have lost a lot of money, and a lot of people are understandably really upset . Here's a quick breakdown of what's happened so far A subreddit called /r/wallstreetbets  (visit at your own peril), which has exploded in popularity recently and has over 5 million subscribers (and counting) got really excited about three stocks: GME (Gamestop), AMC (the movie theater place), and BB (Blackberry). Gamestop was the main stock.  Yes, I know all three companies are doing terribly in the real world. I won't go into why they got excited about the stocks here.  They convinced a lot of other people to buy the stocks and they did well. Really well. Take a look at their Yahoo Finance pages and look at their 1 month price charts (then ignore the past two days). GME , BB , AMC Everyone got in on it, and I mean it. When a lot of people buy a single stock, the price rises. It turns out, this was hurting a lot of Hedge Funds and I

Determining NFL Quarterback Archetypes (with stats!)

We're obsessed with grouping things together. We self-select each other into groups based on which political candidate we support, which sports team we root for, and which arbitrary country we're born in. People also spend hours on the internet arguing over "tiers", or groupings, of their favorite athletes and sports teams. For example, which NBA players are "elite" vs. "great" vs. just "good"? Did Carmelo Anthony belong  on the Banana Boat ? When engaging in these arguments, we typically use statistics like points or rebounds per game to back up our points, but at the end of the day, the groups are more or less kind of arbitrary.  But what if there was a way to algorithmically sort observations into groups based on shared characteristics using machine learning methods? Enter clustering , which is the methodology of grouping similar observations into groups, or "clusters", using a mathematical distance metric derived from a set

Analyzing Hip Hop - Who's Most Lyrical, What Determines Popularity, and More

Have you ever thought about bringing cold, hard statistics to one of life's greatest artistic joys? Well fear not, because in our increasingly data-driven world, our analyst friends are hard at work attempting to statistisize (numerize?) everything you can think of, so we can analyze and therefore optimize it. One of the art realms that is increasingly falling under the purview of data science is music. We all benefit from it in the form of curated daily Spotify playlists and Pandora stations that allow us to find new artists and songs.  I was recently able to get my hands on a Spotify dataset  that contains data on over 160k tracks dating from 1921 through December 2020. Aside from containing some basic features like track name, duration, and release date, it also contains some advanced metrics as calculated by Spotify like "track positivity" (is it a sad, depressed song, or a happy, positive song?), "danceability", "energy", "speechiness" (