As we mentioned in the introduction to Unit 4A, when the variable that we’re interested in studying in the population is categorical, the parameter we are trying to infer about is the population proportion (p) associated with that variable. We also learned that the point estimator for the population proportion p is the sample proportion p-hat.
To refresh your memory, here is a picture that summarizes an example we looked at.
We are now moving on to interval estimation of p. In other words, we would like to develop a set of intervals that, with different levels of confidence, will capture the value of p. We’ve actually done all the groundwork and discussed all the big ideas of interval estimation when we talked about interval estimation for μ (mu), so we’ll be able to go through it much faster. Let’s begin.
Recall that the general form of any confidence interval for an unknown parameter is:
estimate ± margin of error
Since the unknown parameter here is the population proportion p, the point estimator (as I reminded you above) is the sample proportion p-hat. The confidence interval for p, therefore, has the form:
(Recall that m is the notation for the margin of error.) The margin of error (m) gives us the maximum estimation error with a certain confidence. In this case it tells us that p-hat is different from p (the parameter it estimates) by no more than m units.
From our previous discussion on confidence intervals, we also know that the margin of error is the product of two components:
To figure out what these two components are, we need to go back to a result we obtained in the Sampling Distributions section of the Probability unit about the sampling distribution of p-hat. We found that under certain conditions (which we’ll come back to later), p-hat has a normal distribution with mean p, and a
This result makes things very simple for us, because it reveals what the two components are that the margin of error is made of:
- Since, like the sampling distribution of x-bar, the sampling distribution of p-hat is normal, the confidence multipliers that we’ll use in the confidence interval for p will be the same z* multipliers we use for the confidence interval for μ (mu) when σ (sigma) is known (using exactly the same reasoning and the same probability results). The multipliers we’ll use, then, are: 1.645, 1.96, and 2.576 at the 90%, 95% and 99% confidence levels, respectively.
- The standard deviation of our estimator p-hat is
Putting it all together, we find that the confidence interval for p should be:
We just have to solve one practical problem and we’re done. We’re trying to estimate the unknown population proportion p, so having it appear in the confidence interval doesn’t make any sense. To overcome this problem, we’ll do the obvious thing …
We’ll replace p with its sample counterpart, p-hat, and work with the estimated standard error of p-hat
- We would like to share with you the methodology portion of the official poll release for the Viagra example. We hope you see that you now have the tools to understand how poll results are analyzed:
“The results are based on telephone interviews with a randomly selected national sample of 1,005 adults, 18 years and older, conducted May 8-10, 1998. For results based on samples of this size, one can say with 95 percent confidence that the error attributable to sampling and other random effects could be plus or minus 3 percentage points. In addition to sampling error, question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of public opinion polls.”
The purpose of the next activity is to provide guided practice in calculating and interpreting the confidence interval for the population proportion p, and drawing conclusions from it.
Two important results that we discussed at length when we talked about the confidence interval for μ (mu) also apply here:
1. There is a trade-off between level of confidence and the width (or precision) of the confidence interval. The more precision you would like the confidence interval for p to have, the more you have to pay by having a lower level of confidence.
2. Since n appears in the denominator of the margin of error of the confidence interval for p, for a fixed level of confidence, the larger the sample, the narrower, or more precise it is. This brings us naturally to our next point.
Just as we did for means, when we have some level of flexibility in determining the sample size, we can set a desired margin of error for estimating the population proportion and find the sample size that will achieve that.
For example, a final poll on the day before an election would want the margin of error to be quite small (with a high level of confidence) in order to be able to predict the election results with the most precision. This is particularly relevant when it is a close race between the candidates. The polling company needs to figure out how many eligible voters it needs to include in their sample in order to achieve that.
Let’s see how we do that.
(Comment: For our discussion here we will focus on a 95% confidence level (z* = 1.96), since this is the most commonly used level of confidence.)
The confidence interval for p is
The margin of error, then, is
Now we isolate n (i.e., express it as a function of m).
There is a practical problem with this expression that we need to overcome.
Practically, you first determine the sample size, then you choose a random sample of that size, and then use the collected data to find p-hat.
So the fact that the expression above for determining the sample size depends on p-hat is problematic.
The way to overcome this problem is to take the conservative approach by setting p-hat = 1/2 = 0.5.
Why do we call this approach conservative?
It is conservative because the expression that appears in the numerator,
is maximized when p-hat = 1/2 = 0.5.
That way, the n we get will work in giving us the desired margin of error regardless of what the value of p-hat is. This is a “worst case scenario” approach. So when we do that we get:
As we mentioned before, one of the most important things to learn with any inference method is the conditions under which it is safe to use it.
As we did for the mean, the assumption we made in order to develop the methods in this unit was that the sampling distribution of the sample proportion, p-hat is roughly normal. Recall from the Probability unit that the conditions under which this happens are that
Here is one final practice for these confidence intervals!!
In general, a confidence interval for the unknown population proportion (p) is
where z* is 1.645 for 90% confidence, 1.96 for 95% confidence, and 2.576 for 99% confidence.
To obtain a desired margin of error (m) in a confidence interval for an unknown population proportion, a conservative sample size is
If a reasonable estimate of the true proportion is known, the sample size can be calculated using
The methods developed in this unit are safe to use as long as