Normal Random Variables
In the Exploratory Data Analysis unit of this course, we encountered data sets, such as lengths of human pregnancies, whose distributions naturally followed a symmetric unimodal bell shape, bulging in the middle and tapering off at the ends.
Many variables, such as pregnancy lengths, shoe sizes, foot lengths, and other human physical characteristics exhibit these properties: symmetry indicates that the variable is just as likely to take a value a certain distance below its mean as it is to take a value that same distance above its mean; the bell-shape indicates that values closer to the mean are more likely, and it becomes increasingly unlikely to take values far from the mean in either direction.
The particular shape exhibited by these variables has been studied since the early part of the nineteenth century, when they were first called “normal” as a way of suggesting their depiction of a common, natural pattern.
There are many normal distributions. Even though all of them have the bell-shape, they vary in their center and spread.
More specifically, the shape of the distribution is determined by its mean (mu, μ) and the spread is determined by its standard deviation (sigma, σ).
Some observations we can make as we look at this graph are:
- The black and the red normal curves have means or centers at μ = mu = 10. However, the red curve is more spread out and thus has a larger standard deviation. As you look at these two normal curves, notice that as the red graph is squished down, the spread gets larger, thus allowing the area under the curve to remain the same.
- The black and the green normal curves have the same standard deviation or spread (the range of the black curve is 6.5-13.5, and the green curve’s range is 10.5-17.5).
Even more important than the fact that many variables themselves follow the normal curve is the role played by the normal curve in sampling theory, as we’ll see in the next section in our unit on probability.
Understanding the normal distribution is an important step in the direction of our overall goal, which is to relate sample means or proportions to population means or proportions. The goal of this section is to better understand normal random variables and their distributions.
We began to get a feel for normal distributions in the Exploratory Data Analysis (EDA) section, when we introduced the Standard Deviation Rule (or the 68-95-99.7 rule) for how values in a normally-shaped sample data set behave relative to their sample mean (x-bar) and sample standard deviation (s).
This is the same rule that dictates how the distribution of a normal random variable behaves relative to its mean (mu, μ) and standard deviation (sigma, σ). Now we use probability language and notation to describe the random variable’s behavior.
For example, in the EDA section, we would have said “68% of pregnancies in our data set fall within 1 standard deviation (s) of their mean (x-bar).” The analogous statement now would be “If X, the length of a randomly chosen pregnancy, is normal with mean (mu, μ) and standard deviation (sigma, σ), then
In general, if X is a normal random variable, then the probability is
- 68% that X falls within 1 standard deviation (sigma, σ) of the mean (mu, μ)
- 95% that X falls within 2 standard deviations (sigma, σ) of the mean (mu, μ)
- 99.7% that X falls within 3 standard deviation (sigma, σ) of the mean (mu, μ).
Using probability notation, we may write
- Notice that the information from the rule can be interpreted from the perspective of the tails of the normal curve:
- Since 0.68 is the probability of being within 1 standard deviation of the mean, (1 – 0.68) / 2 = 0.16 is the probability of being further than 1 standard deviation below the mean (or further than 1 standard deviation above the mean.)
- Likewise, (1 – 0.95) / 2 = 0.025 is the probability of being more than 2 standard deviations below (or above) the mean.
- And (1 – 0.997) / 2 = 0.0015 is the probability of being more than 3 standard deviations below (or above) the mean.
- The three figures below illustrate this.
- Notice that there are two types of problems we may want to solve: those like (a), (d) and (e), in which a particular interval of values of a normal random variable is given, and we are asked to find a probability, and those like (b), (c) and (f), in which a probability is given and we are asked to identify what the normal random variable’s values would be.
Let’s go back to our example of foot length:
Here is another familiar normal distribution: