Population Proportions

CO-4: Distinguish among different measurement scales, choose the appropriate descriptive and inferential statistical methods based on these distinctions, and interpret the results.
LO 4.30: Interpret confidence intervals for population parameters in context.
LO 4.32: Find confidence intervals for the population proportion using the formula (when required conditions are met) and perform sample size calculations.
CO-6: Apply basic concepts of probability, random variation, and commonly used statistical probability distributions.
LO 6.24: Explain the connection between the sampling distribution of a statistic, and its properties as a point estimator.
LO 6.25: Explain what a confidence interval represents and determine how changes in sample size and confidence level affect the precision of the confidence interval.
Video: Population Proportions (4:13)

Confidence Intervals

As we mentioned in the introduction to Unit 4A, when the variable that we’re interested in studying in the population is categorical, the parameter we are trying to infer about is the population proportion (p) associated with that variable. We also learned that the point estimator for the population proportion p is the sample proportion p-hat.

To refresh your memory, here is a picture that summarizes an example we looked at.

A large circle represents the population of all US Adults. We are interested in the parameter p, the proportion who believe marijuana should be legalized. From this population we create a sample of size n=1000, represented by a smaller circle. In this sample, we find that p hat (the point estimator) is .56 . Using point estimation we estimate p.

We are now moving on to interval estimation of p. In other words, we would like to develop a set of intervals that, with different levels of confidence, will capture the value of p. We’ve actually done all the groundwork and discussed all the big ideas of interval estimation when we talked about interval estimation for μ (mu), so we’ll be able to go through it much faster. Let’s begin.

Recall that the general form of any confidence interval for an unknown parameter is:

estimate ± margin of error

Since the unknown parameter here is the population proportion p, the point estimator (as I reminded you above) is the sample proportion p-hat. The confidence interval for p, therefore, has the form:


(Recall that m is the notation for the margin of error.) The margin of error (m) gives us the maximum estimation error with a certain confidence. In this case it tells us that p-hat is different from p (the parameter it estimates) by no more than m units.

From our previous discussion on confidence intervals, we also know that the margin of error is the product of two components:


To figure out what these two components are, we need to go back to a result we obtained in the Sampling Distributions section of the Probability unit about the sampling distribution of p-hat. We found that under certain conditions (which we’ll come back to later), p-hat has a normal distribution with mean p, and a


This result makes things very simple for us, because it reveals what the two components are that the margin of error is made of:

  • Since, like the sampling distribution of x-bar, the sampling distribution of p-hat is normal, the confidence multipliers that we’ll use in the confidence interval for p will be the same z* multipliers we use for the confidence interval for μ (mu) when σ (sigma) is known (using exactly the same reasoning and the same probability results). The multipliers we’ll use, then, are: 1.645, 1.96, and 2.576 at the 90%, 95% and 99% confidence levels, respectively.
  • The standard deviation of our estimator p-hat is


Putting it all together, we find that the confidence interval for p should be:


We just have to solve one practical problem and we’re done. We’re trying to estimate the unknown population proportion p, so having it appear in the confidence interval doesn’t make any sense. To overcome this problem, we’ll do the obvious thing …

We’ll replace p with its sample counterpart, p-hat, and work with the estimated standard error of p-hat


Now we’re done. The confidence interval for the population proportion p is:



The drug Viagra became available in the U.S. in May, 1998, in the wake of an advertising campaign that was unprecedented in scope and intensity. A Gallup poll found that by the end of the first week in May, 643 out of a random sample of 1,005 adults were aware that Viagra was an impotency medication (based on “Viagra A Popular Hit,” a Gallup poll analysis by Lydia Saad, May 1998).

Let’s estimate the proportion p of all adults in the U.S. who by the end of the first week of May 1998 were already aware of Viagra and its purpose by setting up a 95% confidence interval for p.

We first need to calculate the sample proportion p-hat. Out of 1,005 sampled adults, 643 knew what Viagra is used for, so p-hat = 643/1005 = 0.64

A large circle represents the population of all US Adults (18+). We are interested in the parameter is p, the proportion who know what Viagra is used for. From this population we create a sample of size n=1005, represented by a smaller circle. In this sample, we find that p hat (the point estimator) is .64 .

Therefore, a 95% confidence interval for p is


We can be 95% confident that the proportion of all U.S. adults who were already familiar with Viagra by that time was between 0.61 and 0.67 (or 61% and 67%).

The fact that the margin of error equals 0.03 says we can be 95% confident that unknown population proportion p is within 0.03 (3%) of the observed sample proportion 0.64 (64%). In other words, we are 95% confident that 64% is “off” by no more than 3%.


  • We would like to share with you the methodology portion of the official poll release for the Viagra example. We hope you see that you now have the tools to understand how poll results are analyzed:

“The results are based on telephone interviews with a randomly selected national sample of 1,005 adults, 18 years and older, conducted May 8-10, 1998. For results based on samples of this size, one can say with 95 percent confidence that the error attributable to sampling and other random effects could be plus or minus 3 percentage points. In addition to sampling error, question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of public opinion polls.”

The purpose of the next activity is to provide guided practice in calculating and interpreting the confidence interval for the population proportion p, and drawing conclusions from it.

Two important results that we discussed at length when we talked about the confidence interval for μ (mu) also apply here:

1. There is a trade-off between level of confidence and the width (or precision) of the confidence interval. The more precision you would like the confidence interval for p to have, the more you have to pay by having a lower level of confidence.

2. Since n appears in the denominator of the margin of error of the confidence interval for p, for a fixed level of confidence, the larger the sample, the narrower, or more precise it is. This brings us naturally to our next point.

Sample Size Calculations

Just as we did for means, when we have some level of flexibility in determining the sample size, we can set a desired margin of error for estimating the population proportion and find the sample size that will achieve that.

For example, a final poll on the day before an election would want the margin of error to be quite small (with a high level of confidence) in order to be able to predict the election results with the most precision. This is particularly relevant when it is a close race between the candidates. The polling company needs to figure out how many eligible voters it needs to include in their sample in order to achieve that.

Let’s see how we do that.

(Comment: For our discussion here we will focus on a 95% confidence level (z* = 1.96), since this is the most commonly used level of confidence.)

The confidence interval for p is


The margin of error, then, is


Now we isolate n (i.e., express it as a function of m).


There is a practical problem with this expression that we need to overcome.

Practically, you first determine the sample size, then you choose a random sample of that size, and then use the collected data to find p-hat.

A large circle represents the Population, for which we wish to find p. Step I is to determine the sample size n needed. Step II is to choose an SRS of size n. This is represented by an arrow, which leads to a smaller circle representing the SRS of size n. For the SRS we calculate p-hat. Step III is to use the obtained data to find p.

So the fact that the expression above for determining the sample size depends on p-hat is problematic.

The way to overcome this problem is to take the conservative approach by setting p-hat = 1/2 = 0.5.

Why do we call this approach conservative?

It is conservative because the expression that appears in the numerator,


is maximized when p-hat = 1/2 = 0.5.

That way, the n we get will work in giving us the desired margin of error regardless of what the value of p-hat is. This is a “worst case scenario” approach. So when we do that we get:


In general, for any confidence level we have

  • If we know a reasonable estimate of the proportion we can use:


  • If we choose the conservative estimate assuming we know nothing about the true proportion we use:



It seems like media polls usually use a sample size of 1,000 to 1,200. This could be puzzling.

How could the results obtained from, say, 1,100 U.S. adults give us information about the entire population of U.S. adults? 1,100 is such a tiny fraction of the actual population. Here is the answer:

What sample size n is needed if a margin of error m = 0.03 is desired?


(remember, always round up). In fact, 0.03 is a very commonly used margin of error, especially for media polls. For this reason, most media polls work with a sample of around 1,100 people.

When is it safe to use these methods?

As we mentioned before, one of the most important things to learn with any inference method is the conditions under which it is safe to use it.

As we did for the mean, the assumption we made in order to develop the methods in this unit was that the sampling distribution of the sample proportion, p-hat is roughly normal. Recall from the Probability unit that the conditions under which this happens are that


Since p is unknown, we will replace it with its estimate, the sample proportion, and set


to be the conditions under which it is safe to use the methods we developed in this section.

Here is one final practice for these confidence intervals!!

Let’s summarize

In general, a confidence interval for the unknown population proportion (p) is


where z* is 1.645 for 90% confidence, 1.96 for 95% confidence, and 2.576 for 99% confidence.

To obtain a desired margin of error (m) in a confidence interval for an unknown population proportion, a conservative sample size is


If a reasonable estimate of the true proportion is known, the sample size can be calculated using


The methods developed in this unit are safe to use as long as