View Lecture Slides with Transcript – Estimation

This document linked from Estimation

The applet used in this video is no longer available.

Work to understand the idea – we are now looking at x-bar and p-hat as our “data” and in order to get multiple measurements, we need to repeat the entire sampling process exactly. We need to repeat this process of sampling and recording our statistic until we have as many values as we require.

In practice we don’t do this, we only look at one sample – but the THEORY of frequentist statistics relies on the statistician understanding what happens if we repeat the sampling process.

- Slides 1-4

- Slides 5-8

- Slides 9-12

- Slides 13-17

- Slides 18-26: Applet: Sampling Distribution for p-hat, the sample proportion

- Slides 27-34: Applet: Sampling Distribution for x-bar, the sample mean

- Slide 35 – Summary

This document is linked from Sampling Distributions.

]]>

This document is linked from Estimation.

]]>

As mentioned in the introduction, this last concept in probability is the bridge between the probability section and inference. It focuses on the relationship between sample values (**statistics**) and population values (**parameters**). Statistics vary from sample to sample due to **sampling variability**, and therefore can be regarded as **random variables** whose distribution we call the **sampling distribution**.

In our discussion of sampling distributions, we focused on two statistics, the **sample proportion**, p-hat and the **sample mean**, x-bar. Our goal was to explore the sampling distribution of these two statistics relative to their respective population parameters, p and μ (mu), and we found in **both** cases that under certain conditions the **sampling distribution is approximately normal**. This result is known as the **Central Limit Theorem.** As we’ll see in the next section, the Central Limit Theorem is the foundation for statistical inference.

A **parameter** is a number that describes the population, and a **statistic** is a number that describes the sample.

- Parameters are fixed, and in practice, usually unknown.

- Statistics change from sample to sample due to sampling variability.

- The behavior of the possible values the statistic can take in repeated samples is called the
**sampling distribution**of that statistic.

- The following table summarizes the important information about the two sampling distributions we covered. Both of these results follow from the
**central limit theorem**which basically states that as the sample size increases, the distribution of the average from a sample of size n becomes increasingly normally distributed.

The SAT-Verbal scores of a sample of 300 students at a particular university had a mean of 592 and standard deviation of 73.

According to the university’s reports, the SAT-Verbal scores of all its students had a mean of 580 and a standard deviation of 110.

This document is linked from Sampling Distributions.

]]>**NOTE:** The following videos discuss all three pages related to sampling distributions.

**Review: **We will apply the concepts of normal random variables to **two random variables which are summary statistics from a sample** – these are the **sample mean (x-bar)** and the **sample proportion (p-hat)**.

Already on several occasions we have pointed out the important distinction between a **population** and a **sample**. In Exploratory Data Analysis, we learned to summarize and display values of a variable for a **sample**, such as displaying the blood types of 100 randomly chosen U.S. adults using a pie chart, or displaying the heights of 150 males using a histogram and supplementing it with appropriate numerical measures such as the sample mean (x-bar) and sample standard deviation (s).

In our study of Probability and Random Variables, we discussed the long-run behavior of a variable, considering the **population** of all possible values taken by that variable. For example, we talked about the distribution of blood types among all U.S. adults and the distribution of the random variable X, representing a male’s height.

Now we focus directly on the relationship between the values of a variable for a **sample** and its values for the entire **population** from which the sample was taken. This material is the bridge between probability and our ultimate goal of the course, statistical inference. In inference, we look at a sample and ask what we can say about the population from which it was drawn.

Now, we’ll pose the reverse question: **If I know what the population looks like, what can I expect the sample to look like? **Clearly, inference poses the more practical question, since in practice we can look at a sample, but rarely do we know what the whole population looks like. This material will be more theoretical in nature, since it poses a problem which is not really practical, but will present important ideas which are the foundation for statistical inference.

To better understand the relationship between sample and population, let’s consider the two examples that were mentioned in the introduction.

In the probability section, we presented the distribution of blood types in the entire U.S. **population**:

Assume now that we take a **sample** of 500 people in the United States, record their blood type, and display the sample results:

Note that the percentages (or proportions) that we found in our sample are slightly different than the population percentages. This is really not surprising. Since we took a sample of just 500, we cannot expect that our sample will behave exactly like the population, but if the sample is random (as it was), we expect to get results which are not that far from the population (as we did). If we took yet another sample of size 500:

we again get sample results that are slightly different from the population figures, and also different from what we found in the first sample. This very intuitive idea, that sample results change from sample to sample, is called **sampling variability.**

Let’s look at another example:

Heights among the population of all adult males follow a normal distribution with a mean μ = mu =69 inches and a standard deviation σ = sigma =2.8 inches. Here is a probability display of this population distribution:

A sample of 200 males was chosen, and their heights were recorded. Here are the sample results:

The sample mean (x-bar) is 68.7 inches and the sample standard deviation (s) is 2.95 inches.

Again, note that the sample results are slightly different from the population. The histogram for this sample resembles the normal distribution, but is not as fine, and also the sample mean and standard deviation are slightly different from the population mean and standard deviation. Let’s take another sample of 200 males:

The sample mean (x-bar) is 69.1 inches and the sample standard deviation (s) is 2.66 inches.

Again, as in Example 1 we see the idea of **sampling variability.** In this second sample, the results are pretty close to the population, but different from the results we found in the first sample.

In both the examples, we have numbers that describe the population, and numbers that describe the sample. In Example 1, the number 42% is the population proportion of blood type A, and 39.6% is the sample proportion (in sample 1) of blood type A. In Example 2, 69 and 2.8 are the population mean and standard deviation, and (in sample 1) 68.7 and 2.95 are the sample mean and standard deviation.

A **parameter** is a number that describes the population.

A **statistic** is a number that is computed from the sample.

In Example 1: 42% (0.42) is the parameter and 39.6% (0.396) is a statistic (and 43.2% is another statistic).

In Example 2: 69 and 2.8 are the parameters and 68.7 and 2.95 are statistics (69.1 and 2.66 are also statistics).

In this course, as in the examples above, we focus on the following parameters and statistics:

- population proportion and sample proportion
- population mean and sample mean
- population standard deviation and sample standard deviation

The following table summarizes the three pairs, and gives the notation

The only new notation here is p for population proportion (p = 0.42 for type A in Example 1), and p-hat (using the “hat” symbol ∧ over the p) for the sample proportion which is 0.396 in Example 1, sample 1).

**Comments:**

- Parameters are usually unknown, because it is impractical or impossible to know exactly what values a variable takes for every member of the population.

- Statistics are computed from the sample, and vary from sample to sample due to
**sampling variability**.

In the last part of the course, statistical inference, we will learn how to use a statistic to draw conclusions about an unknown parameter, either by estimating it or by deciding whether it is reasonable to conclude that the parameter equals a proposed value.

Now we’ll learn about the behavior of the statistics assuming that we know the parameters. So, for example, if we know that the population proportion of blood type A in the population is 0.42, and we take a random sample of size 500, what do we expect the sample proportion p-hat to be? Specifically we ask:

- What is the distribution of all possible sample proportions from samples of size 500?
- Where is it centered?
- How much variation exists among different sample proportions from samples of size 500?
- How far off the true value of 0.42 might we expect to be?

Here are some more examples:

If students picked numbers completely at random from the numbers 1 to 20, the proportion of times that the number 7 would be picked is 0.05. When 15 students picked a number “at random” from 1 to 20, 3 of them picked the number 7. Identify the parameter and accompanying statistic in this situation.

The parameter is the population proportion of random selections resulting in the number 7, which is p = 0.05. The accompanying statistic is the sample proportion (p-hat) of selections resulting in the number 7, which is 3/15=0.20.

**Note:** Unrelated to our current discussion, this is an interesting illustration of how we (humans) are not very good at doing things randomly. I used to ask a similar question in introductory statistics courses where I asked students to RANDOMLY pick a number between 1 and 10. The number of students choosing 7 is almost always MUCH larger than would be predicted if the results were truly random.

Try it with some of your friends and family and see if you get similar results. We really like the number 7! Interestingly, if students were aware of this phenomenon, then they tended to pick 3 most often. This is interesting since if choices were truly random, we should see a relatively equal proportion for each number :-)

The length of human pregnancies has a mean of 266 days and a standard deviation of 16 days. A random sample of 9 pregnant women was observed to have a mean pregnancy length of 270 days, with a standard deviation of 14 days. Identify the parameters and accompanying statistics in this situation.

The parameters are population mean μ = mu =266 and population standard deviation σ = sigma = 16. The accompanying statistics are sample mean (x-bar) = 270 and sample standard deviation (s) = 14.

The first step to drawing conclusions about parameters based on the accompanying statistics is to understand how sample statistics behave relative to the parameter(s) that summarizes the entire population. We begin with the behavior of sample proportion relative to population proportion (when the variable of interest is categorical). After that, we will explore the behavior of sample mean relative to population mean (when the variable of interest is quantitative).

In our Introduction to Inference we defined point estimates and interval estimates.

- In
**point estimation**, we estimate an unknown parameter using a single number that is calculated from the sample data.

- In
**interval estimation**, we estimate an unknown parameter using an interval of values that is likely to contain the true value of that parameter (and state how confident we are that this interval indeed captures the true value of the parameter).

In this section, we will introduce the concept of a confidence interval and learn to calculate confidence intervals for population means and population proportions (when certain conditions are met).

In Unit 4B, we will see that confidence intervals are useful whenever we wish to use data to estimate an unknown population parameter, even when this parameter is estimated using multiple variables (such as our cases: CC, CQ, QQ).

For example, we can construct confidence intervals for the slope of a regression equation or the correlation coefficient. In doing so we are always using our data to provide an interval estimate for an unknown population parameter (the TRUE slope, or the TRUE correlation coefficient).

Point estimation is the form of statistical inference in which, based on the sample data, we estimate the unknown parameter of interest using a **single **value (hence the name **point** estimation). As the following two examples illustrate, this form of inference is quite intuitive.

Suppose that we are interested in studying the IQ levels of students at Smart University (SU). In particular (since IQ level is a quantitative variable), we are interested in estimating µ (mu), the mean IQ level of all the students at SU.

A random sample of 100 SU students was chosen, and their (sample) mean IQ level was found to be 115 (x-bar).

If we wanted to estimate µ (mu), the population mean IQ level, by a single number based on the sample, it would make intuitive sense to use the corresponding quantity in the sample, the sample mean which is 115. We say that 115 is the **point estimate** for µ (mu), and in general, we’ll always use the sample mean (x-bar) as the **point estimator** for µ (mu). (Note that when we talk about the **specific** value (115), we use the term **estimate**, and when we talk in general about the **statistic** x-bar, we use the term **estimator**. The following figure summarizes this example:

Here is another example.

Suppose that we are interested in the opinions of U.S. adults regarding legalizing the use of marijuana. In particular, we are interested in the parameter p, the proportion of U.S. adults who believe marijuana should be legalized.

Suppose a poll of 1,000 U.S. adults finds that 560 of them believe marijuana should be legalized. If we wanted to estimate p, the population proportion, using a single number based on the sample, it would make intuitive sense to use the corresponding quantity in the sample, the sample proportion p-hat = 560/1000 = 0.56. We say in this case that 0.56 is the **point estimate** for p, and in general, we’ll always use p-hat as the **point estimator** for p. (Note, again, that when we talk about the **specific** **value** (0.56), we use the term **estimate**, and when we talk in general about the **statistic** p-hat, we use the term **estimator**. Here is a visual summary of this example:

You may feel that since it is so intuitive, you could have figured out point estimation on your own, even without the benefit of an entire course in statistics. Certainly, our intuition tells us that the best estimator for the population mean (mu, µ) should be x-bar, and the best estimator for the population proportion p should be p-hat.

Probability theory does more than this; it actually gives an explanation (beyond intuition) **why** x-bar and p-hat are the good choices as point estimators for µ (mu) and p, respectively. In the Sampling Distributions section of the Probability unit, we learned about the sampling distribution of x-bar and found that **as long as a sample is taken at random**, the distribution of sample means is exactly centered at the value of population mean.

Our statistic, x-bar, is therefore said to be an **unbiased** estimator for µ (mu). Any particular sample mean might turn out to be less than the actual population mean, or it might turn out to be more. But in the long run, such sample means are “on target” in that they will not underestimate any more or less often than they overestimate.

Likewise, we learned that the sampling distribution of the sample proportion, p-hat, is centered at the population proportion p (as long as the sample is taken at random), thus making p-hat an unbiased estimator for p.

As stated in the introduction, probability theory plays an essential role as we establish results for statistical inference. Our assertion above that sample mean and sample proportion are unbiased estimators is the first such instance.

Notice how important the principles of sampling and design are for our above results: if the sample of U.S. adults in (example 2 on the previous page) was not random, but instead included predominantly college students, then 0.56 would be a biased estimate for p, the proportion of all U.S. adults who believe marijuana should be legalized.

If the survey design were flawed, such as loading the question with a reminder about the dangers of marijuana leading to hard drugs, or a reminder about the benefits of marijuana for cancer patients, then 0.56 would be biased on the low or high side, respectively.

Our point estimates are truly **unbiased** estimates for the population parameter **only if the sample is random and the study design is not flawed**.

Not only are the sample mean and sample proportion on target as long as the samples are random, but **their precision improves as sample size increases**.

Again, there are two “layers” here for explaining this.

Intuitively, larger sample sizes give us more information with which to pin down the true nature of the population. We can therefore expect the sample mean and sample proportion obtained from a larger sample to be closer to the population mean and proportion, respectively. In the extreme, when we sample the whole population (which is called a census), the sample mean and sample proportion will exactly coincide with the population mean and population proportion.There is another layer here that, again, comes from what we learned about the sampling distributions of the sample mean and the sample proportion. Let’s use the sample mean for the explanation.

Recall that the sampling distribution of the sample mean x-bar is, as we mentioned before, centered at the population mean µ (mu)and has a standard error (standard deviation of the statistic, x-bar) of

As a result, as the sample size n increases, the sampling distribution of x-bar gets less spread out. This means that values of x-bar that are based on a larger sample are more likely to be closer to µ (mu) (as the figure below illustrates):

Similarly, since the sampling distribution of p-hat is centered at p and has a

which decreases as the sample size gets larger, values of p-hat are more likely to be closer to p when the sample size is larger.

Another example of a point estimator is using sample standard deviation,

to estimate population standard deviation, σ (sigma).

In this course, we will not be concerned with estimating the population standard deviation for its own sake, but since we will often substitute the sample standard deviation (s) for σ (sigma) when standardizing the sample mean, it is worth pointing out that **s is an unbiased estimator for σ** (sigma).

If we had divided by n instead of n – 1 in our estimator for population standard deviation, then in the long run our sample variance would be guilty of a slight underestimation. Division by n – 1 accomplishes the goal of making this point estimator unbiased.

The reason that our formula for s, introduced in the Exploratory Data Analysis unit, involves division by n – 1 instead of by n is the fact that we wish to use unbiased estimators in practice.

- We use p-hat (sample proportion) as a point estimator for p (population proportion). It is an unbiased estimator: its long-run distribution is centered at p as long as the sample is random.

- We use x-bar (sample mean) as a point estimator for µ (mu, population mean). It is an unbiased estimator: its long-run distribution is centered at µ (mu) as long as the sample is random.

- In both cases, the larger the sample size, the more precise the point estimator is. In other words, the larger the sample size, the more likely it is that the sample mean (proportion) is close to the unknown population mean (proportion).

Point estimation is simple and intuitive, but also a bit problematic. Here is why:

When we estimate μ (mu) by the sample mean x-bar we are almost guaranteed to make some kind of error. Even though we know that the values of x-bar fall around μ (mu), it is very unlikely that the value of x-bar will fall exactly at μ (mu).

Given that such errors are a fact of life for point estimates (by the mere fact that we are basing our estimate on one sample that is a small fraction of the population), these estimates are in themselves of limited usefulness, unless we are able to quantify the extent of the estimation error. Interval estimation addresses this issue. The idea behind **interval estimation** is, therefore, to enhance the simple point estimates by supplying information about the size of the error attached.

In this introduction, we’ll provide examples that will give you a solid intuition about the basic idea behind interval estimation.

Consider the example that we discussed in the point estimation section:

Suppose that we are interested in studying the IQ levels of students attending Smart University (SU). In particular (since IQ level is a quantitative variable), we are interested in estimating μ (mu), the mean IQ level of all the students in SU. A random sample of 100 SU students was chosen, and their (sample) mean IQ level was found to be 115 (x-bar).

In point estimation we used x-bar = 115 as the point estimate for μ (mu). However, we had no idea of what the estimation error involved in such an estimation might be. Interval estimation takes point estimation a step further and says something like:

“I am 95% confident that by using the point estimate x-bar = 115 to estimate μ (mu), I am off by no more than 3 IQ points. In other words, I am 95% confident that μ (mu) is within 3 of 115, or between 112 (115 – 3) and 118 (115 + 3).”

Yet another way to say the same thing is: I am 95% confident that μ (mu) is somewhere in (or covered by) the interval (112,118). (**Comment:** At this point you should not worry about, or try to figure out, how we got these numbers. We’ll do that later. All we want to do here is make sure you understand the idea.)

Note that while point estimation provided just one number as an estimate for μ (mu) of 115, interval estimation provides a whole interval of “plausible values” for μ (mu) (between 112 and 118), and also attaches the level of our confidence that this interval indeed includes the value of μ (mu) to our estimation (in our example, 95% confidence). The interval (112,118) is therefore called “a 95% confidence interval for μ (mu).”

Let’s look at another example:

Let’s consider the second example from the point estimation section.

Suppose that we are interested in the opinions of U.S. adults regarding legalizing the use of marijuana. In particular, we are interested in the parameter p, the proportion of U.S. adults who believe marijuana should be legalized.

Suppose a poll of 1,000 U.S. adults finds that 560 of them believe marijuana should be legalized.

If we wanted to estimate p, the population proportion, by a single number based on the sample, it would make intuitive sense to use the corresponding quantity in the sample, the sample proportion p-hat = 560/1000=0.56.

Interval estimation would take this a step further and say something like:

“I am 90% confident that by using 0.56 to estimate the true population proportion, p, I am off by (or, I have an error of) no more than 0.03 (or 3 percentage points). In other words, I am 90% confident that the actual value of p is somewhere between 0.53 (0.56 – 0.03) and 0.59 (0.56 + 0.03).”

Yet another way of saying this is: “I am 90% confident that p is covered by the interval (0.53, 0.59).”

In this example, (0.53, 0.59) is a 90% confidence interval for p.

The two examples showed us that the idea behind interval estimation is, instead of providing just one number for estimating an unknown parameter of interest, to provide an interval of plausible values of the parameter plus a level of confidence that the value of the parameter is covered by this interval.

We are now going to go into more detail and learn how these confidence intervals are created and interpreted in context. As you’ll see, the ideas that were developed in the “Sampling Distributions” section of the Probability unit will, again, be very important. Recall that for point estimation, our understanding of sampling distributions leads to verification that our statistics are unbiased and gives us a precise formulas for the standard error of our statistics.

We’ll start by discussing confidence intervals for the population mean μ (mu), and later discuss confidence intervals for the population proportion p.

]]>