Means (All Steps)
NOTE: Beginning on this page, the Learn By Doing and Did I Get This activities are presented as interactive PDF files. The interactivity may not work on mobile devices or with certain PDF viewers. Use an official ADOBE product such as ADOBE READER.
If you have any issues with the Learn By Doing or Did I Get This interactive PDF files, you can view all of the questions and answers presented on this page in this document:
- Tests About μ (mu) When σ (sigma) is Unknown – The t-test for a Population Mean
- Step 1: State the hypotheses
- Step 2: Obtain data, check conditions, and summarize data
- Step 3: Find the p-value of the test by using the test statistic as follows
- Step 4: Conclusion
- The t-Distribution
So far we have talked about the logic behind hypothesis testing and then illustrated how this process proceeds in practice, using the z-test for the population proportion (p).
We are now moving on to discuss testing for the population mean (μ, mu), which is the parameter of interest when the variable of interest is quantitative.
A few comments about the structure of this section:
- The basic groundwork for carrying out hypothesis tests has already been laid in our general discussion and in our presentation of tests about proportions.
Therefore we can easily modify the four steps to carry out tests about means instead, without going into all of the details again.
We will use this approach for all future tests so be sure to go back to the discussion in general and for proportions to review the concepts in more detail.
- In our discussion about confidence intervals for the population mean, we made the distinction between whether the population standard deviation, σ (sigma) was known or if we needed to estimate this value using the sample standard deviation, s.
In this section, we will only discuss the second case as in most realistic settings we do not know the population standard deviation.
In this case we need to use the t-distribution instead of the standard normal distribution for the probability aspects of confidence intervals (choosing table values) and hypothesis tests (finding p-values).
- Although we will discuss some theoretical or conceptual details for some of the analyses we will learn, from this point on we will rely on software to conduct tests and calculate confidence intervals for us, while we focus on understanding which methods are used for which situations and what the results say in context.
If you are interested in more information about the z-test, where we assume the population standard deviation σ (sigma) is known, you can review the Carnegie Mellon Open Learning Statistics Course (you will need to click “ENTER COURSE”).
Like any other tests, the t-test for the population mean follows the four-step process:
- STEP 1: Stating the hypotheses Hoand Ha.
- STEP 2: Collecting relevant data, checking that the data satisfy the conditions which allow us to use this test, and summarizing the data using a test statistic.
- STEP 3: Finding the p-value of the test, the probability of obtaining data as extreme as those collected (or even more extreme, in the direction of the alternative hypothesis), assuming that the null hypothesis is true. In other words, how likely is it that the only reason for getting data like those observed is sampling variability (and not because Hois not true)?
- STEP 4: Drawing conclusions, assessing the statistical significance of the results based on the p-value, and stating our conclusions in context. (Do we or don’t we have evidence to reject Hoand accept Ha?)
- Note: In practice, we should also always consider the practical significance of the results as well as the statistical significance.
We will now go through the four steps specifically for the t-test for the population mean and apply them to our two examples.
Tests About μ (mu) When σ (sigma) is Unknown – The t-test for a Population Mean
Only in a few cases is it reasonable to assume that the population standard deviation, σ (sigma), is known and so we will not cover hypothesis tests in this case. We discussed both cases for confidence intervals so that we could still calculate some confidence intervals by hand.
For this and all future tests we will rely on software to obtain our summary statistics, test statistics, and p-values for us.
The case where σ (sigma) is unknown is much more common in practice. What can we use to replace σ (sigma)? If you don’t know the population standard deviation, the best you can do is find the sample standard deviation, s, and use it instead of σ (sigma). (Note that this is exactly what we did when we discussed confidence intervals).
Is that it? Can we just use s instead of σ (sigma), and the rest is the same as the previous case? Unfortunately, it’s not that simple, but not very complicated either.
Here, when we use the sample standard deviation, s, as our estimate of σ (sigma) we can no longer use a normal distribution to find the cutoff for confidence intervals or the p-values for hypothesis tests.
We discussed this issue for confidence intervals. We will talk more about the t-distribution after we discuss the details of this test for those who are interested in learning more.
It isn’t really necessary for us to understand this distribution but it is important that we use the correct distributions in practice via our software.
We will wait until UNIT 4B to look at how to accomplish this test in the software. For now focus on understanding the process and drawing the correct conclusions from the p-values given.
Now let’s go through the four steps in conducting the t-test for the population mean.
Step 1: State the hypotheses
The null and alternative hypotheses for the t-test for the population mean (μ, mu) have exactly the same structure as the hypotheses for z-test for the population proportion (p):
The null hypothesis has the form:
- Ho: μ = μ0 (mu = mu_zero)
(where μ0 (mu_zero) is often called the null value)
The alternative hypothesis takes one of the following three forms (depending on the context):
- Ha: μ < μ0 (mu < mu_zero) (one-sided)
- Ha: μ > μ0 (mu > mu_zero) (one-sided)
- Ha: μ ≠ μ0 (mu ≠ mu_zero) (two-sided)
where the choice of the appropriate alternative (out of the three) is usually quite clear from the context of the problem.
If you feel it is not clear, it is most likely a two-sided problem. Students are usually good at recognizing the “more than” and “less than” terminology but differences can sometimes be more difficult to spot, sometimes this is because you have preconceived ideas of how you think it should be! You also cannot use the information from the sample to help you determine the hypothesis. We would not know our data when we originally asked the question.
Now try it yourself. Here are a few exercises on stating the hypotheses for tests for a population mean.
Here are a few more activities for practice.
When setting up hypotheses, be sure to use only the information in the research question. We cannot use our sample data to help us set up our hypotheses.
For this test, it is still important to correctly choose the alternative hypothesis as “less than”, “greater than”, or “different” although generally in practice two-sample tests are used.
Step 2: Obtain data, check conditions, and summarize data
Obtain data from a sample:
- In this step we would obtain data from a sample. This is not something we do much of in courses but it is done very often in practice!
Check the conditions:
- Then we check the conditions under which this test (the t-test for one population mean) can be safely carried out – which are:
- The sample is random (or at least can be considered random in context).
- We are in one of the three situations marked with a green check mark in the following table (which ensure that x-bar is at least approximately normal and the test statistic using the sample standard deviation, s, is therefore a t-distribution with n-1 degrees of freedom – proving this is beyond the scope of this course):
- For large samples, we don’t need to check for normality in the population. We can rely on the sample size as the basis for the validity of using this test.
- For small samples, we need to have data from a normal population in order for the p-values and confidence intervals to be valid.
In practice, for small samples, it can be very difficult to determine if the population is normal. Here is a simulation to give you a better understanding of the difficulties.
Now try it yourself with a few activities.
Comments:
- It is always a good idea to look at the data and get a sense of their pattern regardless of whether you actually need to do it in order to assess whether the conditions are met.
- This idea of looking at the data is relevant to all tests in general. In the next module—inference for relationships—conducting exploratory data analysis before inference will be an integral part of the process.
Here are a few more problems for extra practice.
When setting up hypotheses, be sure to use only the information in the res
Calculate Test Statistic
Assuming that the conditions are met, we calculate the sample mean x-bar and the sample standard deviation, s (which estimates σ (sigma)), and summarize the data with a test statistic.
The test statistic for the t-test for the population mean is:
Recall that such a standardized test statistic represents how many standard deviations above or below μ0 (mu_zero) our sample mean x-bar is.
Therefore our test statistic is a measure of how different our data are from what is claimed in the null hypothesis. This is an idea that we mentioned in the previous test as well.
Again we will rely on the p-value to determine how unusual our data would be if the null hypothesis is true.
As we mentioned, the test statistic in the t-test for a population mean does not follow a standard normal distribution. Rather, it follows another bell-shaped distribution called the t-distribution.
We will present the details of this distribution at the end for those interested but for now we will work on the process of the test.
Here are a few important facts.
- In statistical language we say that the null distribution of our test statistic is the t-distribution with (n-1) degrees of freedom. In other words, when Ho is true (i.e., when μ = μ0 (mu = mu_zero)), our test statistic has a t-distribution with (n-1) d.f., and this is the distribution under which we find p-values.
- For a large sample size (n), the null distribution of the test statistic is approximately Z, so whether we use t(n – 1) or Z to calculate the p-values does not make a big difference. However, software will use the t-distribution regardless of the sample size and so will we.
Although we will not calculate p-values by hand for this test, we can still easily calculate the test statistic.
Try it yourself:
From this point in this course and certainly in practice we will allow the software to calculate our test statistics and we will use the p-values provided to draw our conclusions.
Step 3: Find the p-value of the test by using the test statistic as follows
We will use software to obtain the p-value for this (and all future) tests but here are the images illustrating how the p-value is calculated in each of the three cases corresponding to the three choices for our alternative hypothesis.
Note that due to the symmetry of the t distribution, for a given value of the test statistic t, the p-value for the two-sided test is twice as large as the p-value of either of the one-sided tests. The same thing happens when p-values are calculated under the t distribution as when they are calculated under the Z distribution.
We will show some examples of p-values obtained from software in our examples. For now let’s continue our summary of the steps.
Step 4: Conclusion
As usual, based on the p-value (and some significance level of choice) we assess the statistical significance of results, and draw our conclusions in context.
To review what we have said before:
If p-value ≤ 0.05 then WE REJECT Ho
- Conclusion: There ISenough evidence that Ha is True
If p-value > 0.05 then WE FAIL TO REJECT Ho
- Conclusion: There IS NOTenough evidence that Ha is True
Where instead of Ha is True, we write what this means in the words of the problem, in other words, in the context of the current scenario.
This step has essentially two sub-steps:
(i) Based on the p-value, determine whether or not the results are statistically significant (i.e., the data present enough evidence to reject Ho).
(ii) State your conclusions in the context of the problem.
We are now ready to look at two examples.
EXAMPLE:
A certain prescription medicine is supposed to contain an average of 250 parts per million (ppm) of a certain chemical. If the concentration is higher than this, the drug may cause harmful side effects; if it is lower, the drug may be ineffective.
The manufacturer runs a check to see if the mean concentration in a large shipment conforms to the target level of 250 ppm or not.
A simple random sample of 100 portions is tested, and the sample mean concentration is found to be 247 ppm with a sample standard deviation of 12 ppm.
Here is a figure that represents this example:
1. The hypotheses being tested are:
- Ho: μ = μ0 (mu = mu_zero)
- Ha: μ ≠ μ0 (mu ≠ mu_zero)
- Where μ = population mean part per million of the chemical in the entire shipment
2. The conditions that allow us to use the t-test are met since:
- The sample is random
- The sample size is large enough for the Central Limit Theorem to apply and ensure the normality of x-bar. We do not need normality of the population in order to be able to conduct this test for the population mean. We are in the 2nd column in the table below.
- The test statistic is:
- The data (represented by the sample mean) are 2.5 standard errors below the null value.
3. Finding the p-value.
- To find the p-value we use statistical software, and we calculate a p-value of 0.014.
4. Conclusions:
- The p-value is small (.014) indicating that at the 5% significance level, the results are significant.
- We reject the null hypothesis.
- OUR CONCLUSION IN CONTEXT:
- There is enough evidence to conclude that the mean concentration in entire shipment is not the required 250 ppm.
- It is difficult to comment on the practical significance of this result without more understanding of the practical considerations of this problem.
Here is a summary:
Comments:
- The 95% confidence interval for μ (mu) can be used here in the same way as for proportions to conduct the two-sided test (checking whether the null value falls inside or outside the confidence interval) or following a t-test where Ho was rejected to get insight into the value of μ (mu).
- We find the 95% confidence interval to be (244.619, 249.381). Since 250 is not in the interval we know we would reject our null hypothesis that μ (mu) = 250. The confidence interval gives additional information. By accounting for estimation error, it estimates that the population mean is likely to be between 244.62 and 249.38. This is lower than the target concentration and that information might help determine the seriousness and appropriate course of action in this situation.
In most situations in practice we use TWO-SIDED HYPOTHESIS TESTS, followed by confidence intervals to gain more insight.
For completeness in covering one sample t-tests for a population mean, we still cover all three possible alternative hypotheses here HOWEVER, this will be the last test for which we will do so.
EXAMPLE:
A research study measured the pulse rates of 57 college men and found a mean pulse rate of 70 beats per minute with a standard deviation of 9.85 beats per minute.
Researchers want to know if the mean pulse rate for all college men is different from the current standard of 72 beats per minute.
- The hypotheses being tested are:
- Ho: μ = 72
- Ha: μ ≠ 72
- Where μ = population mean heart rate among college men
- The conditions that allow us to use the t-test are met since:
- The sample is random.
- The sample size is large (n = 57) so we do not need normality of the population in order to be able to conduct this test for the population mean. We are in the 2nd column in the table below.
- The test statistic is:
- The data (represented by the sample mean) are 1.53 estimated standard errors below the null value.
3. Finding the p-value.
- Recall that in general the p-value is calculated under the null distribution of the test statistic, which, in the t-test case, is t(n-1). In our case, in which n = 57, the p-value is calculated under the t(56) distribution. Using statistical software, we find that the p-value is 0.132.
- Here is how we calculated the p-value. http://homepage.stat.uiowa.edu/~mbognar/applets/t.html.
4. Making conclusions.
- The p-value (0.132) is not small, indicating that the results are not significant.
- We fail to reject the null hypothesis.
- OUR CONCLUSION IN CONTEXT:
- There is not enough evidence to conclude that the mean pulse rate for all college men is different from the current standard of 72 beats per minute.
- The results from this sample do not appear to have any practical significance either with a mean pulse rate of 70, this is very similar to the hypothesized value, relative to the variation expected in pulse rates.
Now try a few yourself.
From this point in this course and certainly in practice we will allow the software to calculate our test statistic and p-value and we will use the p-values provided to draw our conclusions.
That concludes our discussion of hypothesis tests in Unit 4A.
In the next unit we will continue to use both confidence intervals and hypothesis test to investigate the relationship between two variables in the cases we covered in Unit 1 on exploratory data analysis – we will look at Case CQ, Case CC, and Case QQ.
Before moving on, we will discuss the details about the t-distribution as a general object.
The t-Distribution
We have seen that variables can be visually modeled by many different sorts of shapes, and we call these shapes distributions. Several distributions arise so frequently that they have been given special names, and they have been studied mathematically.
So far in the course, the only one we’ve named, for continuous quantitative variables, is the normal distribution, but there are others. One of them is called the t-distribution.
The t-distribution is another bell-shaped (unimodal and symmetric) distribution, like the normal distribution; and the center of the t-distribution is standardized at zero, like the center of the standard normal distribution.
Like all distributions that are used as probability models, the normal and the t-distribution are both scaled, so the total area under each of them is 1.
So how is the t-distribution fundamentally different from the normal distribution?
- The spread.
The following picture illustrates the fundamental difference between the normal distribution and the t-distribution:
Here we have an image which illustrates the fundamental difference between the normal distribution and the t-distribution:
You can see in the picture that the t-distribution has slightly less area near the expected central value than the normal distribution does, and you can see that the t distribution has correspondingly more area in the “tails” than the normal distribution does. (It’s often said that the t-distribution has “fatter tails” or “heavier tails” than the normal distribution.)
This reflects the fact that the t-distribution has a larger spread than the normal distribution. The same total area of 1 is spread out over a slightly wider range on the t-distribution, making it a bit lower near the center compared to the normal distribution, and giving the t-distribution slightly more probability in the ‘tails’ compared to the normal distribution.
Therefore, the t-distribution ends up being the appropriate model in certain cases where there is more variability than would be predicted by the normal distribution. One of these cases is stock values, which have more variability (or “volatility,” to use the economic term) than would be predicted by the normal distribution.
There’s actually an entire family of t-distributions. They all have similar formulas (but the math is beyond the scope of this introductory course in statistics), and they all have slightly “fatter tails” than the normal distribution. But some are closer to normal than others.
The t-distributions that have higher “degrees of freedom” are closer to normal (degrees of freedom is a mathematical concept that we won’t study in this course, beyond merely mentioning it here). So, there’s a t-distribution “with one degree of freedom,” another t-distribution “with 2 degrees of freedom” which is slightly closer to normal, another t-distribution “with 3 degrees of freedom” which is a bit closer to normal than the previous ones, and so on.
The following picture illustrates this idea with just a couple of t-distributions (note that “degrees of freedom” is abbreviated “d.f.” on the picture):
The test statistic for our t-test for one population mean is a t-score which follows a t-distribution with (n – 1) degrees of freedom. Recall that each t-distribution is indexed according to “degrees of freedom.” Notice that, in the context of a test for a mean, the degrees of freedom depend on the sample size in the study.
Remember that we said that higher degrees of freedom indicate that the t-distribution is closer to normal. So in the context of a test for the mean, the larger the sample size, the higher the degrees of freedom, and the closer the t-distribution is to a normal z distribution.
As a result, in the context of a test for a mean, the effect of the t-distribution is most important for a study with a relatively small sample size.
We are now done introducing the t-distribution. What are implications of all of this?
- The null distribution of our t-test statistic is the t-distribution with (n-1) d.f. In other words, when Ho is true (i.e., when μ = μ0 (mu = mu_zero)), our test statistic has a t-distribution with (n-1) d.f., and this is the distribution under which we find p-values.
- For a large sample size (n), the null distribution of the test statistic is approximately Z, so whether we use t(n – 1) or Z to calculate the p-values does not make a big difference.