View Lecture Slides with Transcript – Unit 3B – Random Variables

This document linked from Unit 3B – Random Variables

]]>Transcript – Normal Random Variables

This document linked from Normal Random Variables

]]>We have almost reached the end our discussion of probability. We were introduced to the important concept of **random variables**, which are quantitative variables whose value is determined by the outcome of a random experiment.

We discussed discrete and continuous random variables.

We saw that all the information about a **discrete random variable** is packed into its probability distribution. Using that, we can answer probability questions about the random variable and find its **mean and standard deviation**. We ended the part on discrete random variables by presenting a special class of discrete random variables – **binomial random variables.**

As we dove into **continuous random variables**, we saw how calculations can get complicated very quickly, when probabilities associated with a continuous random variable are found by calculating **areas under its density curve**.

As an example for a continuous random variable, we presented the **normal random variable**, and discussed it at length. The normal distribution is extremely important, not just because many variables in real life follow the normal distribution, but mainly because of the important role it plays in statistical inference, our ultimate goal of this course.

We learned how we can avoid calculus by using the **standard normal calculator or table** to find probabilities associated with the normal distribution, and learned how it can be used as an **approximation to the binomial** distribution under certain conditions.

A random variable is a variable whose values are numerical results of a random experiment.

- A
**discrete random variable**is summarized by its probability distribution — a list of its possible values and their corresponding probabilities.

The sum of the probabilities of all possible values must be 1.

The probability distribution can be represented by a table, histogram, or sometimes a formula.

- The
**probability distribution**of a random variable can be supplemented with numerical measures of the center and spread of the random variable.

**Center:** The center of a random variable is measured by its mean (which is sometimes also referred to as the **expected value**).

The mean of a random variable can be interpreted as its long run average.

The mean is a weighted average of the possible values of the random variable weighted by their corresponding probabilities.

**Spread:** The spread of a random variable is measured by its variance, or more typically by its standard deviation (the square root of the variance).

The standard deviation of a random variable can be interpreted as the typical (or long-run average) distance between the value that the random variable assumes and the mean of X.

- The binomial random variable is a type of discrete random variable that is quite common.

- The binomial random variable is defined in a random experiment that consists of n independent trials, each having two possible outcomes (called “success” and “failure”), and each having the same probability of success: p. Such a random experiment is called the binomial random experiment.

- The binomial random variable represents the number of successes (out of n) in a binomial experiment. It can therefore have values as low as 0 (if none of the n trials was a success) and as high as n (if all n trials were successes).

- There are “many” binomial random variables, depending on the number of trials (n) and the probability of success (p).

- The probability distribution of the binomial random variable is given in the form of a formula and can be used to find probabilities. Technology can be used as well.

- The mean and standard deviation of a binomial random variable can be easily found using short-cut formulas.

The probability distribution of a continuous random variable is represented by a probability density curve. The probability that the random variable takes a value in any interval of interest is the area above this interval and below the density curve.

An important example of a continuous random variable is the **normal random variable**, whose probability density curve is symmetric (bell-shaped), bulging in the middle and tapering at the ends.

- There are “many” normal random variables, each determined by its mean
*μ*(mu) (which determines where the density curve is centered) and standard deviation σ (sigma) (which determines how spread out (wide) the normal density curve is).

- Any normal random variable follows the Standard Deviation Rule, which can help us find probabilities associated with the normal random variable.

- Another way to find probabilities associated with the normal random variable is using the standard normal table. This process involves finding the z-score of values, which tells us how many standard deviations below or above the mean the value is.

- An important application of the normal random variable is that it can be used as an approximation of the binomial random variable (under certain conditions). A continuity correction can improve this approximation.

The applet used in this video is no longer available.

Work to understand the idea – we are now looking at x-bar and p-hat as our “data” and in order to get multiple measurements, we need to repeat the entire sampling process exactly. We need to repeat this process of sampling and recording our statistic until we have as many values as we require.

In practice we don’t do this, we only look at one sample – but the THEORY of frequentist statistics relies on the statistician understanding what happens if we repeat the sampling process.

- Slides 1-4

- Slides 5-8

- Slides 9-12

- Slides 13-17

- Slides 18-26: Applet: Sampling Distribution for p-hat, the sample proportion

- Slides 27-34: Applet: Sampling Distribution for x-bar, the sample mean

- Slide 35 – Summary

This document is linked from Sampling Distributions.

]]>As mentioned in the introduction, this last concept in probability is the bridge between the probability section and inference. It focuses on the relationship between sample values (**statistics**) and population values (**parameters**). Statistics vary from sample to sample due to **sampling variability**, and therefore can be regarded as **random variables** whose distribution we call the **sampling distribution**.

In our discussion of sampling distributions, we focused on two statistics, the **sample proportion**, p-hat and the **sample mean**, x-bar. Our goal was to explore the sampling distribution of these two statistics relative to their respective population parameters, p and μ (mu), and we found in **both** cases that under certain conditions the **sampling distribution is approximately normal**. This result is known as the **Central Limit Theorem.** As we’ll see in the next section, the Central Limit Theorem is the foundation for statistical inference.

A **parameter** is a number that describes the population, and a **statistic** is a number that describes the sample.

- Parameters are fixed, and in practice, usually unknown.

- Statistics change from sample to sample due to sampling variability.

- The behavior of the possible values the statistic can take in repeated samples is called the
**sampling distribution**of that statistic.

- The following table summarizes the important information about the two sampling distributions we covered. Both of these results follow from the
**central limit theorem**which basically states that as the sample size increases, the distribution of the average from a sample of size n becomes increasingly normally distributed.

According to the College Board website, the scores on the math part of the SAT (SAT-M) in a certain year had a mean of 507 and standard deviation of 111. Assume that SAT scores follow a normal distribution.

One of the criteria for admission to a certain engineering school is an SAT-M score in the top 2% of scores. How does this translate to an actual SAT-M score? In other words, how high must a student score on the SAT-M in order for his application to be considered? A different way to ask the same question is “What is the 98th percentile of the SAT-M distribution?”

Let’s work through this problem in a step-by-step manner….

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2013/02/qzLBD_08016.swf

This document is linked from Normal Applications.

]]>According to the College Board website, the scores on the math part of the SAT (SAT-M) in a certain year had a mean of 507 and standard deviation of 111. Assume that SAT scores follow a normal distribution.

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2013/02/qzLBD_08014.swf

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2013/02/qzLBD_08015.swf

This document is linked from Normal Applications.

]]>In a much earlier example, we wondered,

“How likely or unlikely is a male foot length of more than 13 inches?” We were unable to solve the problem, because 13 inches didn’t happen to be one of the values featured in the Standard Deviation Rule.

Subsequently, we learned how to standardize a normal value (tell how many standard deviations below or above the mean it is) and how to use the normal calculator or table to find the probability of falling in an interval a certain number of standard deviations below or above the mean.

By combining these two skills, we will now be able to answer questions like the one above.

To convert between a non-standard normal (X) and the standard normal (Z) use the following equations, as needed:

Male foot lengths have a normal distribution, with mean (mu, μ) = 11 inches, and standard deviation (sigma, σ) = 1.5 inches.

**(a)** What is the probability of a foot length of more than 13 inches?

First, we standardize:

The probability that we seek, P(X > 13), is the same as the probability that a normal variable takes a value greater than 1.33 standard deviations above its mean, i.e. P(Z > +1.33)

This can be solved with the normal calculator or table, after applying the property of symmetry:

**P(Z > +1.33) = P(Z < -1.33) = 0.0918. **

A male foot length of more than 13 inches is on the long side, but not too unusual: its probability is about 9%.

We can streamline the solution in terms of probability notation and write:

**P(X > 13) = P(Z > 1.33) = P(Z < −1.33) = 0.0918**

**(b)** What is the probability of a male foot length between 10 and 12 inches?

The standardized values of 10 and 12 are, respectively,

**Note:** The two z-scores in a “between” problem will not always be the same value. You must calculate both or, in this case, you could recognize that both values are the same distance from the mean and hence result in z-scores which are equal but of opposite signs.

**P(-0.67 < Z < +0.67) = P(Z < +0.67) – P(Z < -0.67) = 0.7486 – 0.2514 = 0.4972.**

Or, if you prefer the streamlined notation,

**P(10 < X < 12) = P(−0.67 < Z < +0.67) = P( Z < +0.67) − P(Z < −0.67) = 0.7486 − 0.2514 = 0.4972.**

**Comments:**

By solving the above example, we inadvertently discovered the quartiles of a normal distribution! P(Z < -0.67) = 0.2514 tells us that roughly 25%, or one quarter, of a normal variable’s values are less than 0.67 standard deviations below the mean.

P(Z < +0.67) = 0.7486 tells us that roughly 75%, or three quarters, are less than 0.67 standard deviations above the mean.

And of course the median is equal to the mean, since the distribution is symmetric, the median is 0 standard deviations away from the mean.

Be sure to verify these results for yourself using the calculator or table!

Let’s look at another example.

Length (in days) of a randomly chosen human pregnancy is a normal random variable with mean (mu, μ) = 266 and standard deviation (sigma, σ) = 16.

**(a)** Find Q1, the median, and Q3. Using the z-scores we found in the previous example we have

**Q1 = 266 – 0.67(16) = 255**

**median = mean = 266**

**Q3 = 266 + 0.67(16) = 277**

Thus, the probability is 1/4 that a pregnancy will last less than 255 days; 1/2 that it will last less than 266 days; 3/4 that it will last less than 277 days.

**(b)** What is the probability that a randomly chosen pregnancy will last less than 246 days?

Since (246 – 266) / 16 = -1.25, we write

**P(X < 246) = P(Z < −1.25) = 0.1056**

**(c)** What is the probability that a randomly chosen pregnancy will last longer than 240 days?

Since (240 – 266) / 16 = -1.63, we write

**P(X > 240) = P(Z > −1.63) = P(Z < +1.63) = 0.9484**

Since the mean is 266 and the standard deviation is 16, most pregnancies last longer than 240 days.

**(d)** What is the probability that a randomly chosen pregnancy will last longer than 500 days?

**Method 1:**

Common sense tells us that this would be **impossible**.

**Method 2:**

The standardized value of 500 is (500 – 266) / 16 = +14.625.

**P(X > 500) = P(Z > 14.625) = 0.**

**(e)** Suppose a pregnant woman’s husband has scheduled his business trips so that he will be in town between the 235th and 295th days. What is the probability that the birth will take place during that time?

The standardized values are (235 – 266) / 16) = -1.94 and (295 – 266) / 16 = +1.81.

**P(235 < X < 295) = P(−1.94 < Z < +1.81) = P(Z < +1.81) − P(Z < −1.94) = 0.9649 − 0.0262 = 0.9387.**

There is close to a 94% chance that the husband will be in town for the birth.

Be sure to verify these results for yourself using the calculator or table!

The purpose of the next activity is to give you guided practice at solving word problems that involve normal random variables. In particular, we’ll solve problems like the examples you just went over, in which you are asked to find the probability that a normal random variable falls within a certain interval.

The previous examples most followed the same general form: given values of a normal random variable, you were asked to find an associated probability. The two basic steps in the solution process were to

**Standardize to Z;**

**Find associated probabilities using the standard normal calculator or table.**

The next example will be a different type of problem: given a certain probability, you will be asked to find the associated value of the normal random variable. The solution process will go more or less in reverse order from what it was in the previous examples.

Again, foot length of a randomly chosen adult male is a normal random variable with a mean of 11 and standard deviation of 1.5.

**(a)** The probability is 0.04 that a randomly chosen adult male foot length will be less than how many inches?

According to the normal calculator or table, a probability of 0.04 below (actually 0.0401) is associated with z = -1.75.

In other words, the probability is 0.04 that a normal variable takes a value lower than 1.75 standard deviations below its mean.

**For adult male foot lengths, this would be 11 – 1.75(1.5) = 8.375. The probability is 0.04 that an adult male foot length would be less than 8.375 inches.**

**(b)** The probability is 0.10 that an adult male foot will be longer than how many inches? Caution is needed here because of the word “longer.”

Once again, we must remind ourselves that the calculator and table only show the probability of a normal variable taking a value **lower than** a certain number of standard deviations below or above its mean. Adjustments must be made for problems that involve probabilities besides “lower than” or “less than.” As usual, we have a choice of invoking either symmetry or the fact that the total area under the normal curve is 1. Students should examine both methods and decide which they prefer to use for their own purposes.

**Method 1:**

According to the calculator or table, a probability of 0.10 **below** is associated with a z value of -1.28. By symmetry, it follows that a probability of 0.10 **above** has z = +1.28.

**We seek the foot length that is 1.28 standard deviations above its mean: 11 + 1.28(1.5) = 12.92, or just under 13 inches.**

**Method 2**: If the probability is 0.10 that a foot will be longer than the value we seek, then the probability is 0.90 that a foot will be shorter than that same value, since the probabilities must sum to 1.

According to the calculator or table, a probability of 0.90 below is associated with a z value of +1.28. Again, we seek the foot length that is 1.28 standard deviations above its mean, or 12.92 inches.

**Comment:**

**Part (a) in the above example**could have been re-phrased as: “0.04 is the**proportion**of all adult male foot lengths that are below what value?”, which takes the perspective of thinking about the probability as a proportion of occurrences in the long-run. As originally stated, it focuses on the chance of a randomly chosen individual having a normal value in a given interval.

A study reported that the amount of money spent each week for lunch by a worker in a particular city is a normal random variable with a mean of $35 and a standard deviation of $5.

(a) The probability is 0.97 that a worker will spend less than how much money in a week on lunch?

The z associated with a probability of 0.9700 below is +1.88. The amount that is 1.88 standard deviations above the mean is **35 + 1.88(5) = 44.4, or $44.40.**

(b) There is a 30% chance of spending more than how much for lunches in a week?

The z associated with a probability of 0.30 above is +0.52. The amount is **35 + 0.52(5) = 37.6, or $37.60.**

**Comment:**

- Another way of expressing
**Example (part a.)**above would be to ask, “What is the 97th**percentile**for the amount (X) spent by workers in a week for their lunch?” Many normal variables, such as heights, weights, or exam scores, are often expressed in terms of percentiles.

The height X (in inches) of a randomly chosen woman is a normal random variable with a mean of 65 and a standard deviation of 2.5.

What is the height of a woman who is in the 80th percentile?

A probability of 0.7995 in the table corresponds to z = +0.84. Her height is **65 + 0.84(2.5) = 67.1 inches.**

By now we have had practice in solving normal probability problems in both directions: those where a normal value is given and we are asked to report a probability and those where a probability is given and we are asked to report a normal value. Strategies for solving such problems are outlined below:

- Given a normal value x, solve for probability:
- Standardize: calculate

- If you are using the online calculator: Type the z-score for which you wish to find the area to the left and hit “compute.”
- If you are using the table: Locate z in the margins of the normal table (ones and tenths for the row, hundredths for the column). Find the corresponding probability (given to four decimal places) of a normal random variable taking a value below z inside the table.
- (Adjust if the problem involves something other than a “less-than” probability, by invoking either symmetry or the fact that the total area under the normal curve is 1.)

- Standardize: calculate

- Given a probability, solve for normal value x:
- (Adjust if the problem involves something other than a “less-than” probability, by invoking either symmetry or the fact that the total area under the normal curve is 1.)
- Locate the probability (given to four decimal places) inside the normal table. Using the table, find the corresponding z value in the margins (row for ones and tenths, column for hundredths). Using the calculator, provide the area to left of the z-score you wish to find and hit “compute.”
- “Unstandardize”: calculate

This next activity is a continuation of the previous one, and will give you guided practice in solving word problems involving the normal distribution. In particular, we’ll solve problems like the ones you just solved, in which you are given a probability and you are asked to find the normal value associated with it.

The normal distribution can be used as a reasonable approximation to other distributions under certain circumstances. Here we will illustrate this approximation for the binomial distribution.

We will not do any calculations here as we simply wish to illustrate the concept. In the next section on sampling distributions, we will look at another measure related to the binomial distribution, the sample proportion, and at that time we will discuss the underlying normal distribution.

Consider the binomial probability distribution displayed below for n = 20 and p = 0.5.

Now we overlay a normal distribution with the same mean and standard deviation.

And in the final image, we can see the regions for the exact and approximate probabilities shaded.

Unfortunately, the approximated probability, 0.1867, is quite a bit different from the actual probability, 0.2517. However, this example constitutes something of a “worst-case scenario” according to the usual criteria for use of a normal approximation.

Probabilities for a binomial random variable X with n and p may be approximated by those for a normal random variable having the same mean and standard deviation as long as the sample size n is large enough relative to the proportions of successes and failures, p and 1 – p. Our Rule of Thumb will be to require that

**np ≥ 10 and n(1 − p) ≥ 10**

It is possible to improve the normal approximation to the binomial by adjusting for the discrepancy that arises when we make the shift from the areas of histogram rectangles to the area under a smooth curve. For example, if we want to find the binomial probability that X is less than **or equal to** 8, we are including the area of the entire rectangle over 8, which actually extends to 8.5. Our normal approximation only included the area up to 8.

This document is linked from Standard Normal Distribution.

]]>

This document is linked from Standard Normal Distribution.

]]>