Population Means (Part 2)
95% is the most commonly used level of confidence. However, we may wish to increase our level of confidence and produce an interval that’s almost certain to contain μ (mu). Specifically, we may want to report an interval for which we are 99% confident that it contains the unknown population mean, rather than only 95%.
Using the same reasoning as in the last comment, in order to create a 99% confidence interval for μ (mu), we should ask: There is a probability of 0.99 that any normal random variable takes values within how many standard deviations of its mean? The precise answer is 2.576, and therefore, a 99% confidence interval for μ (mu) is:
Another commonly used level of confidence is a 90% level of confidence. Since there is a probability of 0.90 that any normal random variable takes values within 1.645 standard deviations of its mean, the 90% confidence interval for μ (mu) is:
The purpose of this next activity is to give you guided practice at calculating and interpreting confidence intervals, and drawing conclusions from them.
Note from the previous example and the previous “Did I Get This?” activity, that the more confidence I require, the wider the confidence interval for μ (mu). The 99% confidence interval is wider than the 95% confidence interval, which is wider than the 90% confidence interval.
This is not very surprising, given that in the 99% interval we multiply the standard deviation of the statistic by 2.576, in the 95% by 2, and in the 90% only by 1.645. Beyond this numerical explanation, there is a very clear intuitive explanation and an important implication of this result.
Let’s start with the intuitive explanation. The more certain I want to be that the interval contains the value of μ (mu), the more plausible values the interval needs to include in order to account for that extra certainty. I am 95% certain that the value of μ (mu) is one of the values in the interval (112.1, 117.9). In order to be 99% certain that one of the values in the interval is the value of μ (mu), I need to include more values, and thus provide a wider confidence interval.
In our example, the wider 99% confidence interval (111, 119) gives us a less precise estimation about the value of μ (mu) than the narrower 90% confidence interval (112.5, 117.5), because the smaller interval ‘narrows-in’ on the plausible values of μ (mu).
The important practical implication here is that researchers must decide whether they prefer to state their results with a higher level of confidence or produce a more precise interval. In other words,
The price we have to pay for a higher level of confidence is that the unknown population mean will be estimated with less precision (i.e., with a wider confidence interval). If we would like to estimate μ (mu) with more precision (i.e. a narrower confidence interval), we will need to sacrifice and report an interval with a lower level of confidence.
So far we’ve developed the confidence interval for the population mean “from scratch” based on results from probability, and discussed the trade-off between the level of confidence and the precision of the interval. The price you pay for a higher level of confidence is a lower level of precision of the interval (i.e., a wider interval).
Is there a way to bypass this trade-off? In other words, is there a way to increase the precision of the interval (i.e., make it narrower) without compromising on the level of confidence? We will answer this question shortly, but first we’ll need to get a deeper understanding of the different components of the confidence interval and its structure.
We explored the confidence interval for μ (mu) for different levels of confidence, and found that in general, it has the following form:
where z* is a general notation for the multiplier that depends on the level of confidence. As we discussed before:
- For a 90% level of confidence, z* = 1.645
- For a 95% level of confidence, z* = 1.96
- For a 99% level of confidence, z* = 2.576
To start our discussion about the structure of the confidence interval, let’s denote
The confidence interval, then, has the form:
To summarize, we have
X-bar is the sample mean, the point estimator for the unknown population mean (μ, mu).
m is called the margin of error, since it represents the maximum estimation error for a given level of confidence.
For example, for a 95% confidence interval, we are 95% confident that our estimate will not depart from the true population mean by more than m, the margin of error and m is further made up of the product of two components:
Here is a summary of the different components of the confidence interval and its structure:
This structure: estimate ± margin of error, where the margin of error is further composed of the product of a confidence multiplier and the standard deviation of the statistic (or, as we’ll see, the standard error) is the general structure of all confidence intervals that we will encounter in this course.
Obviously, even though each confidence interval has the same components, the formula for these components is different from confidence interval to confidence interval, depending on what unknown parameter the confidence interval aims to estimate.
Since the structure of the confidence interval is such that it has a margin of error on either side of the estimate, it is centered at the estimate (in our current case, x-bar), and its width (or length) is exactly twice the margin of error:
The margin of error, m, is therefore “in charge” of the width (or precision) of the confidence interval, and the estimate is in charge of its location (and has no effect on the width).
Let us now go back to the confidence interval for the mean, and more specifically, to the question that we posed at the beginning of the previous page:
Is there a way to increase the precision of the confidence interval (i.e., make it narrower) without compromising on the level of confidence?
Since the width of the confidence interval is a function of its margin of error, let’s look closely at the margin of error of the confidence interval for the mean and see how it can be reduced:
Since z* controls the level of confidence, we can rephrase our question above in the following way:
Is there a way to reduce this margin of error other than by reducing z*?
If you look closely at the margin of error, you’ll see that the answer is yes. We can do that by increasing the sample size n (since it appears in the denominator).
Let’s look at an example first and then explain why increasing the sample size is a way to increase the precision of the confidence interval without compromising on the level of confidence.
Let’s try to understand why is it that a larger sample size will reduce the margin of error for a fixed level of confidence. There are three ways to explain this: mathematically, using probability theory, and intuitively.
We’ve already alluded to the mathematical explanation; the margin of error is
and since n, the sample size, appears in the denominator, increasing n will reduce the margin of error.
As we saw in our discussion about point estimates, probability theory tells us that:
This explains why with a larger sample size the margin of error (which represents how far apart we believe x-bar might be from μ (mu) for a given level of confidence) is smaller.
On an intuitive level, if our estimate x-bar is based on a larger sample (i.e., a larger fraction of the population), we have more faith in it, or it is more reliable, and therefore we need to account for less error around it.
- While it is true that for a given level of confidence, increasing the sample size increases the precision of our interval estimation, in practice, increasing the sample size is not always possible.
- Consider a study in which there is a non-negligible cost involved for collecting data from each participant (an expensive medical procedure, for example). If the study has some budgetary constraints, which is usually the case, increasing the sample size from 100 to 400 is just not possible in terms of cost-effectiveness.
- Another instance in which increasing the sample size is impossible is when a larger sample is simply not available, even if we had the money to afford it. For example, consider a study on the effectiveness of a drug on curing a very rare disease among children. Since the disease is rare, there are a limited number of children who could be participants.
- This is the reality of statistics. Sometimes theory collides with reality, and you simply do the best you can.