Population Means (Part 1)

The General Case

CO-4: Distinguish among different measurement scales, choose the appropriate descriptive and inferential statistical methods based on these distinctions, and interpret the results.
LO 4.30: Interpret confidence intervals for population parameters in context.
LO 4.31: Find confidence intervals for the population mean using the normal distribution (Z) based confidence interval formula (when required conditions are met) and perform sample size calculations.
CO-6: Apply basic concepts of probability, random variation, and commonly used statistical probability distributions.
LO 6.24: Explain the connection between the sampling distribution of a statistic, and its properties as a point estimator.
LO 6.25: Explain what a confidence interval represents and determine how changes in sample size and confidence level affect the precision of the confidence interval.

As the introduction mentioned, we’ll start our discussion on interval estimation with interval estimation for the population mean μ (mu). We’ll start by showing how a 95% confidence interval is constructed, and later generalize to other levels of confidence. We’ll also discuss practical issues related to interval estimation.

Recall the IQ example:

EXAMPLE:

Suppose that we are interested in studying the IQ levels of students at Smart University (SU). In particular (since IQ level is a quantitative variable), we are interested in estimating μ (mu), the mean IQ level of all the students at SU.

We will assume that from past research on IQ scores in different universities, it is known that the IQ standard deviation in such populations is σ (sigma) = 15. In order to estimate μ (mu), a random sample of 100 SU students was chosen, and their (sample) mean IQ level is calculated (let’s assume, for now, that we have not yet found the sample mean).

A large circle represents the Population of all Students at SU. We are interested in the variable IQ, and the unknown parameter is μ, the population mean IQ level. In addition, we know that σ = 15. From this population we create a sample of size n=100, represented by a smaller circle. In this sample, we need to find x bar

We will now show the rationale behind constructing a 95% confidence interval for the population mean μ (mu).

  • We learned in the “Sampling Distributions” section of probability that according to the central limit theorem, the sampling distribution of the sample mean x-bar is approximately normal with a mean of μ (mu) and standard deviation of σ/sqrt(n) = sigma/sqrt(n). In our example, then, (where σ (sigma) = 15 and n = 100), the possible values of x-bar, the sample mean IQ level of 100 randomly chosen students, is approximately normal, with mean μ (mu) and standard deviation 15/sqrt(100) = 1.5.
  • Next, we recall and apply the Standard Deviation Rule for the normal distribution, and in particular its second part: There is a 95% chance that the sample mean we will find in our sample falls within 2 * 1.5 = 3 of μ (mu).

A distribution curve with a horizontal axis labeled "X bar." The curve is centered at X-bar=μ, and marked on the axis is μ+3 and μ-3. There is a 95% chance that any x-bar will fall between μ-3 and μ+3.

Obviously, if there is a certain distance between the sample mean and the population mean, we can describe that distance by starting at either value. So, if the sample mean (x-bar) falls within a certain distance of the population mean μ (mu), then the population mean μ (mu) falls within the same distance of the sample mean.

Therefore, the statement, “There is a 95% chance that the sample mean x-bar falls within 3 units of μ (mu)” can be rephrased as: “We are 95% confident that the population mean μ (mu) falls within 3 units of the x-bar we found in our sample.”

So, if we happen to get a sample mean of x-bar = 115, then we are 95% confident that μ (mu) falls within 3 units of 115, or in other words that μ (mu) is covered by the interval (115 – 3, 115 + 3) = (112,118).

(On later pages, we will use similar reasoning to develop a general formula for a confidence interval.)

Comment:

  • Note that the first phrasing is about x-bar, which is a random variable; that’s why it makes sense to use probability language. But the second phrasing is about μ (mu), which is a parameter, and thus is a “fixed” value that does not change, and that’s why we should not use probability language to discuss it. In these problems, it is our x-bar that will change when we repeat the process, not μ (mu). This point will become clearer after you do the activities which follow.

The General Case

Let’s generalize the IQ example. Suppose that we are interested in estimating the unknown population mean (μ, mu) based on a random sample of size n. Further, we assume that the population standard deviation (σ, sigma) is known.

Note: The assumption that the population standard deviation is known is not usually realistic, however, we make it here to be able to introduce the concepts in the simplest case. Later, we will discuss the changes which need to be made when we do not know the population standard deviation.

A large circle represents the population of interest. μ is unknown, but σ is known about the population. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS.

The values of x-bar follow a normal distribution with (unknown) mean μ (mu) and standard deviation σ/sqrt(n)=sigma/sqrt(n) (known, since both σ, sigma, and n are known). In the standard deviation rule, we stated that approximately 95% of values fall within 2 standard deviations of μ (mu). From now on, we will be a little more precise and use the standard normal table to find the exact value for 95%.

Our picture is as follows:

mod11-z95

Try using the applet in the post for Learn by Doing – Normal Random Variables to find the cutoff illustrated above.

We can also verify the z-score using a calculator or table by finding the z-score with the area of 0.025 to the left (which would give us -1.96) or with the area to the left of 0.975 = 0.95 + 0.025 (which would give us +1.96).

mod11-z95L  mod11-z95U

mod11-z95Ltab

mod11-z95Utab

Thus, there is a 95% chance that our sample mean x-bar will fall within 1.96*σ/sqrt(n) = 1.96*sigma/sqrt(n) of μ (mu).

Which means we are 95% confident that μ (mu) falls within 1.96*σ/sqrt(n) = 1.96*sigma/sqrt(n) of our sample mean x-bar.

Here, then, is the general result:

Suppose a random sample of size n is taken from a normal population of values for a quantitative variable whose mean (μ, mu) is unknown, when the standard deviation (σ, sigma) is given.

A 95% confidence interval (CI) for μ (mu) is:

mod11-CI_mean95

Comment:

  • Note that for now we require the population standard deviation (σ, sigma) to be known. Practically, σ (sigma) is rarely known, but for some cases, especially when a lot of research has been done on the quantitative variable whose mean we are estimating (such as IQ, height, weight, scores on standardized tests), it is reasonable to assume that σ (sigma) is known. Eventually, we will see how to proceed when σ (sigma) is unknown, and must be estimated with sample standard deviation (s).

Let’s look at another example.

EXAMPLE:

An educational researcher was interested in estimating μ (mu), the mean score on the math part of the SAT (SAT-M) of all community college students in his state. To this end, the researcher has chosen a random sample of 650 community college students from his state, and found that their average SAT-M score is 475. Based on a large body of research that was done on the SAT, it is known that the scores roughly follow a normal distribution with the standard deviation σ (sigma) =100.

Here is a visual representation of this story, which summarizes the information provided:

A large circle represents the population of interest. μ is unknown, but σ is known about the population. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS.

Based on this information, let’s estimate μ (mu) with a 95% confidence interval.

Using the formula we developed earlier

mod11-CI_mean95

the 95% confidence interval for μ (mu) is:

mod11-CI_mean95_ex1

We will usually provide information on how to round your final answer. In this case, one decimal place is enough precision for this scenario. You could also round to the nearest whole number without much loss of information here.

We are not done yet. An equally important part is to interpret what this means in the context of the problem.

We are 95% confident that the mean SAT-M score of all community college students in the researcher’s state is covered by the interval (467.3, 482.7). Note that the confidence interval was obtained by taking 475 ± 7.7. This means that we are 95% confident that by using the sample mean (x-bar = 475) to estimate μ (mu), our error is no more than 7.7 points.

You just gained practice computing and interpreting a confidence interval for a population mean. Note that the way a confidence interval is used is that we hope the interval contains the population mean μ (mu). This is why we call it an “interval for the population mean.”

The following activity is designed to help give you a better understanding of the underlying reasoning behind the interpretation of confidence intervals. In particular, you will gain a deeper understanding of why we say that we are “95% confident that the population mean is covered by the interval.”

We just saw that one interpretation of a 95% confidence interval is that we are 95% confident that the population mean (μ, mu) is contained in the interval. Another useful interpretation in practice is that, given the data, the confidence interval represents the set of plausible values for the population mean μ (mu).

EXAMPLE:

As an illustration, let’s return to the example of mean SAT-Math score of community college students. Recall that we had constructed the confidence interval (467.3, 482.7) for the unknown mean SAT-M score for all community college students.

Here is a way that we can use the confidence interval:

Do the results of this study provide evidence that μ (mu), the mean SAT-M score of community college students, is lower than the mean SAT-M score in the general population of college students in that state (which is 480)?

The 95% confidence interval for μ (mu) was found to be (467.3, 482.7). Note that 480, the mean SAT-M score in the general population of college students in that state, falls inside the interval, which means that it is one of the plausible values for μ (mu).

A number line, on which the 95% confidence interval for μ has been marked, from 467 to 483. At 480 is the mean SAT-M score in the general population of college students in the state.

This means that μ (mu) could be 480 (or even higher, up to 483), and therefore we cannot conclude that the mean SAT-M score among community college students in the state is lower than the mean in the general population of college students in that state. (Note that the fact that most of the plausible values for μ (mu) fall below 480 is not a consideration here.)

mod11-CI_mean95

the 95% confidence interval for μ (mu) is:

mod11-CI_mean95_ex1

We will usually provide information on how to round your final answer. In this case, one decimal place is enough precision for this scenario. You could also round to the nearest whole number without much loss of information here.

We are not done yet. An equally important part is to interpret what this means in the context of the problem.

We are 95% confident that the mean SAT-M score of all community college students in the researcher’s state is covered by the interval (467.3, 482.7). Note that the confidence interval was obtained by taking 475 ± 7.7. This means that we are 95% confident that by using the sample mean (x-bar = 475) to estimate μ (mu), our error is no more than 7.7 points.