Skip to main content

Coronavirus: How are we doing? (methodology)

To create my metric, I make use of a 2-parameter model pioneered by Viboud et. al. Specifically, it measures number of cases on day t, C(t), using the following parameters: is a measure of growth rate, is a constant, and the "deceleration of growth parameter" is given by $p = 1-\frac{1}{m}$, where is between 0 and 1. Specifically, has unique properties. If is equal to 0, then the cumulative number of cases grows linearly. If is equal to 1, then the cumulative number of cases grows exponentially. Everything in-between displays sub-exponential growth. Cases with sub-exponential growth can be modeled using the following formula:
$C(t)=(\frac{r}{m}t+A)^{m}$

and can be estimated by applying current data and nonlinear least squares (NLS) to the above formula. In practice, as p approaches 0, a plot of cumulative cases is better represented by a linear model than an exponential one. Conversely, as approaches 1, a plot of cumulative cases is better represented by an exponential model. This can be seen by plotting for each country the difference in the r-squared of a linear model and the r-squared of an exponential model against p. We see a strong correlation (correlation = -0.89). Somewhere around p = 0.5, a linear model better explains the data:

As such, can be used as a "metric" of whether the cumulative case profile of a country can be represented by a linear model or an exponential one. As we know, there is a large difference between exponential and linear growth -- this difference is so stark that r, or the growth rate, is almost irrelevant. A higher p indicates a better fit to an exponential model than a linear one, and vice versa. Some examples may help further illustrate this point.

Spain (p = 1.00, ~8,000 cases):
Bahrain (p = 0.52, 214 cases):



Comments

Popular posts from this blog

Why isn't Robinhood letting me trade? (hint: there's probably not a conspiracy against you)

Today's been a big day in the stock market . Lots of people have lost a lot of money, and a lot of people are understandably really upset . Here's a quick breakdown of what's happened so far A subreddit called /r/wallstreetbets  (visit at your own peril), which has exploded in popularity recently and has over 5 million subscribers (and counting) got really excited about three stocks: GME (Gamestop), AMC (the movie theater place), and BB (Blackberry). Gamestop was the main stock.  Yes, I know all three companies are doing terribly in the real world. I won't go into why they got excited about the stocks here.  They convinced a lot of other people to buy the stocks and they did well. Really well. Take a look at their Yahoo Finance pages and look at their 1 month price charts (then ignore the past two days). GME , BB , AMC Everyone got in on it, and I mean it. When a lot of people buy a single stock, the price rises. It turns out, this was hurting a lot of Hedge Funds and I

Determining NFL Quarterback Archetypes (with stats!)

We're obsessed with grouping things together. We self-select each other into groups based on which political candidate we support, which sports team we root for, and which arbitrary country we're born in. People also spend hours on the internet arguing over "tiers", or groupings, of their favorite athletes and sports teams. For example, which NBA players are "elite" vs. "great" vs. just "good"? Did Carmelo Anthony belong  on the Banana Boat ? When engaging in these arguments, we typically use statistics like points or rebounds per game to back up our points, but at the end of the day, the groups are more or less kind of arbitrary.  But what if there was a way to algorithmically sort observations into groups based on shared characteristics using machine learning methods? Enter clustering , which is the methodology of grouping similar observations into groups, or "clusters", using a mathematical distance metric derived from a set

Analyzing Hip Hop - Who's Most Lyrical, What Determines Popularity, and More

Have you ever thought about bringing cold, hard statistics to one of life's greatest artistic joys? Well fear not, because in our increasingly data-driven world, our analyst friends are hard at work attempting to statistisize (numerize?) everything you can think of, so we can analyze and therefore optimize it. One of the art realms that is increasingly falling under the purview of data science is music. We all benefit from it in the form of curated daily Spotify playlists and Pandora stations that allow us to find new artists and songs.  I was recently able to get my hands on a Spotify dataset  that contains data on over 160k tracks dating from 1921 through December 2020. Aside from containing some basic features like track name, duration, and release date, it also contains some advanced metrics as calculated by Spotify like "track positivity" (is it a sad, depressed song, or a happy, positive song?), "danceability", "energy", "speechiness" (