In this section on estimation, we have discussed the basic process for constructing confidence intervals from point estimates. In doing so we must calculate the margin of error using the standard error (or estimated standard error) and a z* or t* value.
As we wrap up this topic, we wanted to again discuss the interpretation of a confidence interval.
What do we mean by “confidence”?
Suppose we find a 95% confidence interval for an unknown parameter, what does the 95% mean exactly?
- If we repeat the process for all possible samples of this size for the population, 95% of the intervals we construct will contain the parameter
This is NOT the same as saying “the probability that μ (mu) is contained in (the interval constructed from my sample) is 95%.” Why?!
- Once we have a particular confidence interval, the true value is either in the interval constructed from our sample (probability = 1) or it is not (probability = 0). We simply do not know which it is. If we were to say “the probability that μ (mu) is contained in (the interval constructed from my sample) is 95%,” we know we would be incorrect since it is either 0 (No) or 1 (Yes) for any given sample. The probability comes from the “long run” view of the process.
- The probability we used to construct the confidence interval was based upon the fact that the sample statistic (x-bar, p-hat) will vary in a manner we understand (because we know the sampling distribution).
- The probability is associated with the randomness of our statistic so that for a particular interval we only speak of being “95% confident” which translates into an understanding about the process.
- In other words, in statistics, “95% confident” means our confidence in the process and implies that in the long run, we will be correct by using this process 95% of the time but that 5% of the time we will be incorrect. For one particular use of this process we cannot know if we are one of the 95% which are correct or one of the 5% which are incorrect. That is the statistical definition of confidence.
- We can say that in the long run, 95% of these intervals will contain the true parameter and 5% will not.
Example: Suppose a 95% confidence interval for the proportion of U.S. adults who are not active at all is (0.23, 0.27).
- Correct Interpretation #1: We are 95% confident that the true proportion of U.S. adults who are not active at all is between 23% and 27%
- Correct Interpretation #2: We are 95% confident that the true proportion of U.S. adults who are not active at all is covered by the interval (23%, 27%)
- A More Thorough Interpretation: Based upon our sample, the true proportion of U.S. adults who are not active at all is estimated to be 25%. With 95% confidence, this value could be as small as 23% to as large as 27%.
- A Common Interpretation in Journal Articles: Based upon our sample, the true proportion of U.S. adults who are not active at all is estimated to be 25% (95% CI 23%-27%).
Now let’s look at an INCORRECT interpretation which we have seen before
- INCORRECT Interpretation: There is a 95% chance that the true proportion of U.S. adults who are not active at all is between 23% and 27%. We know this is incorrect because at this point, the true proportion and the numbers in our interval are fixed. The probability is either 1 or 0 depending on whether the interval is one of the 95% that cover the true proportion, or one of the 5% that do not.
For confidence intervals regarding a population mean, we have an additional caution to discuss about interpretations.
Example: Suppose a 95% confidence interval for the average minutes per day of exercise for U.S. adults is (12, 18).
- Correct Interpretation: We are 95% confident that the true mean minutes per day of exercise for U.S. adults is between 12 and 18 minutes.
- INCORRECT Interpretation: We are 95% confident that an individual U.S. adult exercises between 12 and 18 minutes per day. We must remember that our intervals are about the parameter, in this case the population mean. They do not apply to an individual as we expect individuals to have much more variation.
- INCORRECT Interpretation: We are 95% confident that U.S. adults exercise between 12 and 18 minutes per day.This interpretation is implying this is true for all U.S. adults. This is an incorrect interpretation for the same reason as the previous incorrect interpretation!
As we continue to study inferential statistics, we will see that confidence intervals are used in many situations. The goal is always to provide confidence in our interval estimate of a quantity of interest. Population means and proportions are common parameters, however, any quantity that can be estimated from data has a population counterpart which we may wish to estimate.