Reverse engineering a sample income distribution

This article will teach you everything you need to know about drawing your very own realistic sample income distribution using two commonly available metrics: mean income and Gini index. Let me preface my article by saying I rely on an assumption that incomes can be approximated using a lognormal distribution. That's not an uncommon sentiment, and a visual inspection shows that it's not a crazy assumption:

Example lognormal distribution

Example income distribution

Armed with this assumption, creating a sample income distribution is a simple exercise if you know the arithmetic mean and standard deviation of the distribution in question. Given these two parameters, one can calculate lognormal mean and standard deviation with the following formulas:

$\mu = ln(\frac{E[X]^2}{\sqrt{Var[X] + E[X]^2}})$

$\sigma^2 = ln(1+\frac{Var[X]}{E[X]^2})$

Finding mean income for a society is pretty easy. Finding the standard deviation of income, however, is another beast. It's not a wholly useful stat, as there are better measures of income inequality, so government agencies don't typically report it. Enter the Gini index. The Gini index is a metric of income or wealth inequality in a distribution, and is commonly available for pretty much every society. In fact, the World Bank has an estimate for every country. Luckily for us, the Gini index is directly related to a distribution's lognormal standard deviation:

$Gini = erf(\frac{\sigma}{2})$

$= \int_0^\frac{\sigma}{2} e^{-t^2} dt$

Personally, I haven't done an integral since 2014, so I'm not equipped to solve this problem. Wolfram Alpha, however, is more than capable of solving this. Given an arithmetic mean from the internet and a lognormal standard deviation from the Gini index, we can use the formulas above to solve for the lognormal mean.

Let's walk through an example. Say we have a fictional society, Orlais, with a mean income of \$100,000. We can't find data on Orlais' standard deviation of income, but we do know that its Gini index is 0.60. First, let's find the lognormal standard deviation:

Okay, so we know that the lognormal standard deviation is approximately 1.19. Let's solve for the arithmetic standard deviation and lognormal mean and then draw our distribution:

The arithmetic mean and standard deviation of our distribution are 100,000 and 176,664, respectively. If we check the sample mean, standard deviation, and Gini index of our newly drawn distribution, we get: 100,474, 173,903, 0.60 -- which all check out. So we now have a sample income distribution with the desired properties, and can do whatever we want with it.

Statistics with Sasa

Search This Blog