Normal Random Variables

CO-6: Apply basic concepts of probability, random variation, and commonly used statistical probability distributions.
LO 6.2: Apply the standard deviation rule to the special case of distributions having the “normal” shape.
Video: Normal Random Variables (2:08)

In the Exploratory Data Analysis unit of this course, we encountered data sets, such as lengths of human pregnancies, whose distributions naturally followed a symmetric unimodal bell shape, bulging in the middle and tapering off at the ends.

mod8-image110a

Many variables, such as pregnancy lengths, shoe sizes, foot lengths, and other human physical characteristics exhibit these properties: symmetry indicates that the variable is just as likely to take a value a certain distance below its mean as it is to take a value that same distance above its mean; the bell-shape indicates that values closer to the mean are more likely, and it becomes increasingly unlikely to take values far from the mean in either direction.

The particular shape exhibited by these variables has been studied since the early part of the nineteenth century, when they were first called “normal” as a way of suggesting their depiction of a common, natural pattern.

Observations of Normal Distributions

There are many normal distributions. Even though all of them have the bell-shape, they vary in their center and spread.

mod8-image_normal1a

More specifically, the shape of the distribution is determined by its mean (mu, μ) and the spread is determined by its standard deviation (sigma, σ).

Some observations we can make as we look at this graph are:

  • The black and the red normal curves have means or centers at μ = mu = 10. However, the red curve is more spread out and thus has a larger standard deviation. As you look at these two normal curves, notice that as the red graph is squished down, the spread gets larger, thus allowing the area under the curve to remain the same.
  • The black and the green normal curves have the same standard deviation or spread (the range of the black curve is 6.5-13.5, and the green curve’s range is 10.5-17.5).

Even more important than the fact that many variables themselves follow the normal curve is the role played by the normal curve in sampling theory, as we’ll see in the next section in our unit on probability.

Understanding the normal distribution is an important step in the direction of our overall goal, which is to relate sample means or proportions to population means or proportions. The goal of this section is to better understand normal random variables and their distributions.

The Standard Deviation Rule for Normal Random Variables

We began to get a feel for normal distributions in the Exploratory Data Analysis (EDA) section, when we introduced the Standard Deviation Rule (or the 68-95-99.7 rule) for how values in a normally-shaped sample data set behave relative to their sample mean (x-bar) and sample standard deviation (s).

This is the same rule that dictates how the distribution of a normal random variable behaves relative to its mean (mu, μ) and standard deviation (sigma, σ). Now we use probability language and notation to describe the random variable’s behavior.

For example, in the EDA section, we would have said “68% of pregnancies in our data set fall within 1 standard deviation (s) of their mean (x-bar).” The analogous statement now would be “If X, the length of a randomly chosen pregnancy, is normal with mean (mu, μ) and standard deviation (sigma, σ), thenmod8-normal-prob1a

In general, if X is a normal random variable, then the probability is

  • 68% that X falls within 1 standard deviation (sigma, σ) of the mean (mu, μ)
  • 95% that X falls within 2 standard deviations (sigma, σ) of the mean (mu, μ)
  • 99.7% that X falls within 3 standard deviation (sigma, σ) of the mean (mu, μ).

Using probability notation, we may write

mod8-normal-prob2

mod8-image121

Comment

  • Notice that the information from the rule can be interpreted from the perspective of the tails of the normal curve:
    • Since 0.68 is the probability of being within 1 standard deviation of the mean, (1 – 0.68) / 2 = 0.16 is the probability of being further than 1 standard deviation below the mean (or further than 1 standard deviation above the mean.)
    • Likewise, (1 – 0.95) / 2 = 0.025 is the probability of being more than 2 standard deviations below (or above) the mean.
    • And (1 – 0.997) / 2 = 0.0015 is the probability of being more than 3 standard deviations below (or above) the mean.
  • The three figures below illustrate this.

mod8-image122 mod8-image123 mod8-image124

EXAMPLE: Foot Length

Suppose that foot length of a randomly chosen adult male is a normal random variable with mean μ = mu = 11 and standard deviation σ = sigma =1.5. Then the Standard Deviation Rule lets us sketch the probability distribution of X as follows:

mod8-image127

(a) What is the probability that a randomly chosen adult male will have a foot length between 8 and 14 inches?

0.95, or 95%.

(b) An adult male is almost guaranteed (.997 probability) to have a foot length between what two values?

6.5 and 15.5 inches.

(c) The probability is only 2.5% that an adult male will have a foot length greater than how many inches?

14. (See image below)

mod8-image128

Now you should try a few. (Use the figure that is just before part (a) to help you.)

Comment

  • Notice that there are two types of problems we may want to solve: those like (a)(d) and (e), in which a particular interval of values of a normal random variable is given, and we are asked to find a probability, and those like (b)(c) and (f), in which a probability is given and we are asked to identify what the normal random variable’s values would be.
Learn by Doing: Normal Random Variables

Let’s go back to our example of foot length:

EXAMPLE: Foot Length

How likely or unlikely is it for a male’s foot length to be more than 13 inches?

mod8-image129

Since 13 inches doesn’t happen to be exactly 1, 2, or 3 standard deviations away from the mean, we would only be able to give a very rough estimate of the probability at this point.

Clearly, the Standard Deviation Rule only describes the tip of the iceberg, and while it serves well as an introduction to the normal curve, and gives us a good sense of what would be considered likely and unlikely values, it is very limited in the probability questions it can help us answer.

Here is another familiar normal distribution:

EXAMPLE: SAT Scores

mod8-image_normal_sat1

Suppose we are interested in knowing the probability that a randomly selected student will score 633 or more on the math portion of his or her SAT (this is represented by the red area). Again, 633 does not fall exactly 1, 2, or 3 standard deviations above the mean.

Notice, however, that an SAT score of 633 and a foot length of 13 are both about 1/3 of the way between 1 and 2 standard deviations. As you continue to read, you’ll realize that this positioning relative to the mean is the key to finding probabilities.