Skip to main content

Reverse engineering a sample income distribution

This article will teach you everything you need to know about drawing your very own realistic sample income distribution using two commonly available metrics: mean income and Gini index.  Let me preface my article by saying I rely on an assumption that incomes can be approximated using a lognormal distribution.  That's not an uncommon sentiment, and a visual inspection shows that it's not a crazy assumption:

Example lognormal distribution










Example income distribution

hh-inc-dist

Armed with this assumption, creating a sample income distribution is a simple exercise if you know the arithmetic mean and standard deviation of the distribution in question.  Given these two parameters, one can calculate lognormal mean and standard deviation with the following formulas: 
$\mu = ln(\frac{E[X]^2}{\sqrt{Var[X] + E[X]^2}})$
$\sigma^2 = ln(1+\frac{Var[X]}{E[X]^2})$

Finding mean income for a society is pretty easy. Finding the standard deviation of income, however, is another beast.  It's not a wholly useful stat, as there are better measures of income inequality, so government agencies don't typically report it.  Enter the Gini index.  The Gini index is a metric of income or wealth inequality in a distribution, and is commonly available for pretty much every society.  In fact, the World Bank has an estimate for every country.   Luckily for us, the Gini index is directly related to a distribution's lognormal standard deviation: 
$Gini = erf(\frac{\sigma}{2})$
$= \int_0^\frac{\sigma}{2} e^{-t^2} dt$

Personally, I haven't done an integral since 2014, so I'm not equipped to solve this problem.  Wolfram Alpha, however, is more than capable of solving this.  Given an arithmetic mean from the internet and a lognormal standard deviation from the Gini index, we can use the formulas above to solve for the lognormal mean.

Let's walk through an example. Say we have a fictional society, Orlais, with a mean income of \$100,000.  We can't find data on Orlais' standard deviation of income, but we do know that its Gini index is 0.60.  First, let's find the lognormal standard deviation:


Okay, so we know that the lognormal standard deviation is approximately 1.19.  Let's solve for the arithmetic standard deviation and lognormal mean and then draw our distribution:


The arithmetic mean and standard deviation  of our distribution are 100,000 and 176,664, respectively.  If we check the sample mean, standard deviation, and Gini index of our newly drawn distribution, we get: 100,474, 173,903, 0.60 -- which all check out.  So we now have a sample income distribution with the desired properties, and can do whatever we want with it.


Comments

Popular posts from this blog

Why isn't Robinhood letting me trade? (hint: there's probably not a conspiracy against you)

Today's been a big day in the stock market . Lots of people have lost a lot of money, and a lot of people are understandably really upset . Here's a quick breakdown of what's happened so far A subreddit called /r/wallstreetbets  (visit at your own peril), which has exploded in popularity recently and has over 5 million subscribers (and counting) got really excited about three stocks: GME (Gamestop), AMC (the movie theater place), and BB (Blackberry). Gamestop was the main stock.  Yes, I know all three companies are doing terribly in the real world. I won't go into why they got excited about the stocks here.  They convinced a lot of other people to buy the stocks and they did well. Really well. Take a look at their Yahoo Finance pages and look at their 1 month price charts (then ignore the past two days). GME , BB , AMC Everyone got in on it, and I mean it. When a lot of people buy a single stock, the price rises. It turns out, this was hurting a lot of Hedge Funds and I

Determining NFL Quarterback Archetypes (with stats!)

We're obsessed with grouping things together. We self-select each other into groups based on which political candidate we support, which sports team we root for, and which arbitrary country we're born in. People also spend hours on the internet arguing over "tiers", or groupings, of their favorite athletes and sports teams. For example, which NBA players are "elite" vs. "great" vs. just "good"? Did Carmelo Anthony belong  on the Banana Boat ? When engaging in these arguments, we typically use statistics like points or rebounds per game to back up our points, but at the end of the day, the groups are more or less kind of arbitrary.  But what if there was a way to algorithmically sort observations into groups based on shared characteristics using machine learning methods? Enter clustering , which is the methodology of grouping similar observations into groups, or "clusters", using a mathematical distance metric derived from a set

The Minimum Wage, the Living Wage, and the Wardrobe

The Senate is currently in intense debate regarding raising the federal minimum wage. Several potential wages have been proposed, including a $10/hour plan from Senators Romney and Cotton  and a more generous $15/hour plan from the progressive Democrats. Right now the current federal minimum wage stands at $7.25 per hour, which 21 states (including my notably Blue home state of Virginia) adhere to. While the debate rages on, I wanted to take a closer look at the history of the minimum wage, the concept of a "living wage", and how these two terms invariably tie together across the United States.  More importantly, at some point, there are diminishing returns and increasing costs to increasing the minimum wage. So where should we settle? The History of the Minimum Wage This isn't a history blog, so I'll be brief. The minimum wage was established under the Fair Labor Standards Act in 1938 and set at $0.25/hour, which is worth around $4.60/hour today. Since then, it has