Understanding the Central Limit Theorem

$$ %---- MACROS FOR SETS ----% \newcommand{\znz}[1]{\mathbb{Z} / #1 \mathbb{Z}} \newcommand{\twoheadrightarrowtail}{\mapsto\mathrel{\mspace{-15mu}}\rightarrow} % popular set names \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\I}{\mathbb{I}} % popular vector space notation \newcommand{\V}{\mathbb{V}} \newcommand{\W}{\mathbb{W}} \newcommand{\B}{\mathbb{B}} \newcommand{\D}{\mathbb{D}} %---- MACROS FOR FUNCTIONS ----% % linear algebra \newcommand{\T}{\mathrm{T}} \renewcommand{\ker}{\mathrm{ker}} \newcommand{\range}{\mathrm{range}} \renewcommand{\span}{\mathrm{span}} \newcommand{\rref}{\mathrm{rref}} \renewcommand{\dim}{\mathrm{dim}} \newcommand{\col}{\mathrm{col}} \newcommand{\nullspace}{\mathrm{null}} \newcommand{\row}{\mathrm{row}} \newcommand{\rank}{\mathrm{rank}} \newcommand{\nullity}{\mathrm{nullity}} \renewcommand{\det}{\mathrm{det}} \newcommand{\proj}{\mathrm{proj}} \renewcommand{\H}{\mathrm{H}} \newcommand{\trace}{\mathrm{trace}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\card}{\mathrm{card}} \newcommand\norm[1]{\left\lVert#1\right\rVert} % differential equations \newcommand{\laplace}[1]{\mathcal{L}\{#1\}} \newcommand{\F}{\mathrm{F}} % misc \newcommand{\sign}{\mathrm{sign}} \newcommand{\softmax}{\mathrm{softmax}} \renewcommand{\th}{\mathrm{th}} \newcommand{\adj}{\mathrm{adj}} \newcommand{\hyp}{\mathrm{hyp}} \renewcommand{\max}{\mathrm{max}} \renewcommand{\min}{\mathrm{min}} \newcommand{\where}{\mathrm{\ where\ }} \newcommand{\abs}[1]{\vert #1 \vert} \newcommand{\bigabs}[1]{\big\vert #1 \big\vert} \newcommand{\biggerabs}[1]{\Bigg\vert #1 \Bigg\vert} \newcommand{\equivalent}{\equiv} \newcommand{\cross}{\times} % statistics \newcommand{\cov}{\mathrm{cov}} \newcommand{\var}{\mathrm{var}} \newcommand{\bias}{\mathrm{bias}} \newcommand{\E}{\mathrm{E}} \newcommand{\prob}{\mathrm{prob}} \newcommand{\unif}{\mathrm{unif}} \newcommand{\invNorm}{\mathrm{invNorm}} \newcommand{\invT}{\mathrm{invT}} % real analysis \renewcommand{\sup}{\mathrm{sup}} \renewcommand{\inf}{\mathrm{inf}} %---- MACROS FOR ALIASES AND REFORMATTING ----% % logic \newcommand{\forevery}{\ \forall\ } \newcommand{\OR}{\lor} \newcommand{\AND}{\land} \newcommand{\then}{\implies} % set theory \newcommand{\impropersubset}{\subseteq} \newcommand{\notimpropersubset}{\nsubseteq} \newcommand{\propersubset}{\subset} \newcommand{\notpropersubset}{\not\subset} \newcommand{\union}{\cup} \newcommand{\Union}[2]{\bigcup\limits_{#1}^{#2}} \newcommand{\intersect}{\cap} \newcommand{\Intersect}[2]{\bigcap\limits_{#1}^{#2}} \newcommand{\intersection}[2]{\bigcap\limits_{#1}^{#2}} \newcommand{\Intersection}[2]{\bigcap\limits_{#1}^{#2}} \newcommand{\closure}{\overline} \newcommand{\compose}{\circ} % linear algebra \newcommand{\subspace}{\le} \newcommand{\angles}[1]{\langle #1 \rangle} \newcommand{\identity}{\mathbb{1}} \newcommand{\orthogonal}{\perp} \renewcommand{\parallel}[1]{#1^{||}} % calculus \newcommand{\integral}[2]{\int\limits_{#1}^{#2}} \newcommand{\limit}[1]{\lim\limits_{#1}} \newcommand{\approaches}{\rightarrow} \renewcommand{\to}{\rightarrow} \newcommand{\convergesto}{\rightarrow} % algebra \newcommand{\summation}[2]{\sum\limits_{#1}^{#2}} \newcommand{\product}[2]{\prod\limits_{#1}^{#2}} \newcommand{\by}{\times} \newcommand{\integral}[2]{\int_{#1}^{#2}} % exists commands \newcommand{\notexist}{\nexists\ } \newcommand{\existsatleastone}{\exists\ } \newcommand{\existsonlyone}{\exists!} \newcommand{\existsunique}{\exists!} \let\oldexists\exists \renewcommand{\exists}{\ \oldexists\ } % statistics \newcommand{\distributed}{\sim} \newcommand{\onetoonecorresp}{\sim} \newcommand{\independent}{\perp\!\!\!\perp} \newcommand{\conditionedon}{\ |\ } \newcommand{\given}{\ |\ } \newcommand{\notg}{\ngtr} \newcommand{\yhat}{\hat{y}} \newcommand{\betahat}{\hat{\beta}} \newcommand{\sigmahat}{\hat{\sigma}} \newcommand{\muhat}{\hat{\mu}} \newcommand{\transmatrix}{\mathrm{P}} \renewcommand{\choose}{\binom} % misc \newcommand{\infinity}{\infty} \renewcommand{\bold}{\textbf} \newcommand{\italics}{\textit} $$

The Central Limit Theorem in the Wild

On many occasions, you’re trying to ask questions about the average of some population of unknown distribution in order to find a confidence interval or test a hypothesis. Imagine you’ve taken some relatively large IID random sample, $X_1, \dots, X_n$, from an unknown distribution but you know that distribution has a finite positive variance and finite mean.

So what can you say about the average, $\bar{X}$, of $X_1, \dots, X_n$? Well, $\bar{X}$ is a random variable itself and like any other random variable there’s a distribution that describes it. Because $\bar{X}$ is a random variable that’s a statistic, you call the distribution that describes it a sampling distribution. So imagine you take a random sample and calculate its mean, $\bar{X}$, and you repeat this process over and over so that you have a set of means, $\bar{X}_1, \dots, \bar{X}_n$.

Notice what happens when you take the value of these means and plot their percentage frequencies; an interesting shape starts to form which you’ll quickly identify to be reminiscent of a normal distribution. And if you do this experiment again but with even larger random sample sizes, you’ll notice that the plot of percentage frequencies looks even more like a normal distribution. Setosa has a great visual example of this experiment in action.

The Expected Value and Standard Deviation of the Sample Mean

Although you haven’t proved that the sampling distribution for $\bar{X}$ approaches a normal distribution, you’ve noticed it experimentally. What other questions can you ask about the statistic $\bar{X}$? What is its expected value? Well, you know that $\bar{X} = \frac{X_1 + \dots + X_n}{n}$. So it follows that

$$ \begin{align} \E(\bar{X}) &= \E\bigg(\frac{X_1 + \dots + X_n}{n}\bigg) \\
&= \frac{1}{n} \E(X_1 + \dots + X_n) \\
&= \frac{1}{n} \big(\E(X_1) + \dots + \E(X_n)\big) \\
&= \frac{1}{n} (\mu + \dots + \mu) \\
&= \frac{1}{n} \cdot n \mu \\
&= \mu \end{align} $$

Notice the significance of this result; the expected value of the mean from a random sample is equal to the population mean that the random sample comes from.

Can you deduce the standard deviation of $\bar{X}$?

$$ \begin{align} \var(\bar{X}) &= \var\bigg(\frac{X_1 + \dots + X_n}{n}\bigg) \\
&= \bigg(\frac{1}{n}\bigg)^2 \var(X_1 + \dots + X_n) \\
&= \frac{1}{n^2} (\var(X_1) + \dots + \var(X_n)) \\
&= \frac{1}{n^2} (\sigma^2 + \dots + \sigma^2) \\
&= \frac{1}{n^2} \cdot n \sigma^2 \\
&= \frac{\sigma^2}{n} \end{align} $$

Since the variance is $\frac{\sigma^2}{n}$, the standard deviation is $\frac{\sigma}{\sqrt{n}}$.

Forming the Central Limit Theorem

To recap, you’ve observed that the distribution of $\bar{X}$ seems to be normal and have proved that $\E(\bar{X}) = \mu$ and $\text{std}(\bar{X}) = \frac{\sigma}{\sqrt{n}}$. You know that an easier way to describe the standard normal distribution, $N(0, 1)$, is with its standard normal form (also sometimes called the standard score or z-score), $\frac{X - \mu}{\sigma}$. This standard score is a random variable that indicates how many standard deviations a random variable, $X$, is from the mean. Since you suspect the distribution of $\bar{X}$ to be normal, then you might infer its standard normal form to be $\frac{\bar{X} - \mu}{\sigma / \sqrt{n}}$ since $\mu = \E(\bar{X})$ and $\frac{\sigma}{\sqrt{n}}$ is the standard deviation of $\bar{X}$.

Now all you need to do is prove that the sampling distribution for $\bar{X}$ approaches a standard normal distribution, $N(0, 1)$, as the sample size approaches infinity. Mathematically you want to prove:

$$\frac{\bar{X} - \mu}{\sigma / \sqrt{n}} = \frac{\summation{i=1}{n} X_i - n \mu}{\sqrt{n} \sigma} \approaches N(0, 1) \text{ as } n \approaches \infinity$$

The relationship above is famously known as the Central Limit Theorem and indeed has been proven utilizing moment generating functions and a bit of calculus. Fortunately for you, we won’t go through that proof here, but you should relish the fact that your experimental results holds up against theoretical rigor.

At the end of the day, what does the Central Limit Theorem mean for you? Well, it allows you to use probabilistic and statistical methods that you already know work for the normal distribution for problems involving other types of distributions. That random sample you took at the beginning of this article could have been from any distribution. Yet, the distribution of its mean is normal. This theorem is indispensable because, among many reasons, it allows you to create confidence intervals for means and perform hypothesis tests - topics that will be covered in later posts. But until then, I hope this guide has been a sufficient introduction into the Central Limit Theorem.