Proportions (Step 4 & Summary)
- Step 4. Drawing Conclusions Based on the P-Value
- Many Students Wonder: Hypothesis Testing for the Population Proportion
- Let’s Summarize!!
- What’s next?
Step 4. Drawing Conclusions Based on the P-Value
This last part of the four-step process of hypothesis testing is the same across all statistical tests, and actually, we’ve already said basically everything there is to say about it, but it can’t hurt to say it again.
The p-value is a measure of how much evidence the data present against Ho. The smaller the p-value, the more evidence the data present against Ho.
We already mentioned that what determines what constitutes enough evidence against Ho is the significance level (α, alpha), a cutoff point below which the p-value is considered small enough to reject Ho in favor of Ha. The most commonly used significance level is 0.05.
- If p-value ≤ 0.05 then WE REJECT Ho
- Conclusion: There IS enough evidence that Ha is True
- If p-value > 0.05 then WE FAIL TO REJECT Ho
- Conclusion: There IS NOT enough evidence that Ha is True
Where instead of Ha is True, we write what this means in the words of the problem, in other words, in the context of the current scenario.
It is important to mention again that this step has essentially two sub-steps:
- (i) Based on the p-value, determine whether or not the results are statistically significant (i.e., the data present enough evidence to reject Ho).
- (ii) State your conclusions in the context of the problem.
Note: We always still must consider whether the results have any practical significance, particularly if they are statistically significant as a statistically significant result which has not practical use is essentially meaningless!
Let’s go back to our three examples and draw conclusions.
EXAMPLE:
Has the proportion of defective products been reduced as a result of the repair?
We found that the p-value for this test was 0.023.
Since 0.023 is small (in particular, 0.023 < 0.05), the data provide enough evidence to reject Ho.
Conclusion:
- There IS enough evidence that the proportion of defective products is less than 20% after the repair.
The following figure is the complete story of this example, and includes all the steps we went through, starting from stating the hypotheses and ending with our conclusions:
EXAMPLE:
Is the proportion of marijuana users in the college higher than the national figure?
We found that the p-value for this test was 0.182.
Since .182 is not small (in particular, 0.182 > 0.05), the data do not provide enough evidence to reject Ho.
Conclusion:
- There IS NOT enough evidence that the proportion of students at the college who use marijuana is higher than the national figure.
Here is the complete story of this example:
EXAMPLE:
Did the proportion of U.S. adults who support the death penalty change between 2003 and a later poll?
We found that the p-value for this test was 0.021.
Since 0.021 is small (in particular, 0.021 < 0.05), the data provide enough evidence to reject Ho
Conclusion:
- There IS enough evidence that the proportion of adults who support the death penalty for convicted murderers has changed since 2003.
Here is the complete story of this example:
Many Students Wonder: Hypothesis Testing for the Population Proportion
Many students wonder why 5% is often selected as the significance level in hypothesis testing, and why 1% is the next most typical level. This is largely due to just convenience and tradition.
When Ronald Fisher (one of the founders of modern statistics) published one of his tables, he used a mathematically convenient scale that included 5% and 1%. Later, these same 5% and 1% levels were used by other people, in part just because Fisher was so highly esteemed. But mostly these are arbitrary levels.
The idea of selecting some sort of relatively small cutoff was historically important in the development of statistics; but it’s important to remember that there is really a continuous range of increasing confidence towards the alternative hypothesis, not a single all-or-nothing value. There isn’t much meaningful difference, for instance, between a p-value of .049 or .051, and it would be foolish to declare one case definitely a “real” effect and to declare the other case definitely a “random” effect. In either case, the study results were roughly 5% likely by chance if there’s no actual effect.
Whether such a p-value is sufficient for us to reject a particular null hypothesis ultimately depends on the risk of making the wrong decision, and the extent to which the hypothesized effect might contradict our prior experience or previous studies.
Let’s Summarize!!
We have now completed going through the four steps of hypothesis testing, and in particular we learned how they are applied to the z-test for the population proportion. Here is a brief summary:
- Step 1: State the hypotheses
State the null hypothesis:
Ho: p = p0
State the alternative hypothesis:
Ha: p < p0 (one-sided)
Ha: p > p0 (one-sided)
Ha: p ≠ p0 (two-sided)
where the choice of the appropriate alternative (out of the three) is usually quite clear from the context of the problem. If you feel it is not clear, it is most likely a two-sided problem. Students are usually good at recognizing the “more than” and “less than” terminology but differences can sometimes be more difficult to spot, sometimes this is because you have preconceived ideas of how you think it should be! Use only the information given in the problem.
- Step 2: Obtain data, check conditions, and summarize data
Obtain data from a sample and:
(i) Check whether the data satisfy the conditions which allow you to use this test.
random sample (or at least a sample that can be considered random in context)
the conditions under which the sampling distribution of p-hat is normal are met
(ii) Calculate the sample proportion p-hat, and summarize the data using the test statistic:
(Recall: This standardized test statistic represents how many standard deviations above or below p0 our sample proportion p-hat is.)
- Step 3: Find the p-value of the test by using the test statistic as follows
When the alternative hypothesis is “less than” the probability of observing a test statistic as small as that observed or smaller, assuming that the values of the test statistic follow a standard normal distribution. We will now represent this probability in symbols and also using the normal distribution.
Looking at the shaded region, you can see why this is often referred to as a left-tailed test. We shaded to the left of the test statistic, since less than is to the left.
When the alternative hypothesis is “greater than” the probability of observing a test statistic as large as that observed or larger, assuming that the values of the test statistic follow a standard normal distribution. Again, we will represent this probability in symbols and using the normal distribution
Looking at the shaded region, you can see why this is often referred to as a right-tailed test. We shaded to the right of the test statistic, since greater than is to the right.
When the alternative hypothesis is “not equal to” the probability of observing a test statistic which is as large in magnitude as that observed or larger, assuming that the values of the test statistic follow a standard normal distribution.
This is often referred to as a two-tailed test, since we shaded in both directions.
- Step 4: Conclusion
Reach a conclusion first regarding the statistical significance of the results, and then determine what it means in the context of the problem.
If p-value ≤ 0.05 then WE REJECT Ho
Conclusion: There IS enough evidence that Ha is True
If p-value > 0.05 then WE FAIL TO REJECT Ho
Conclusion: There IS NOT enough evidence that Ha is True
Recall that: If the p-value is small (in particular, smaller than the significance level, which is usually 0.05), the results are statistically significant (in the sense that there is a statistically significant difference between what was observed in the sample and what was claimed in Ho), and so we reject Ho.
If the p-value is not small, we do not have enough statistical evidence to reject Ho, and so we continue to believe that Ho may be true. (Remember: In hypothesis testing we never “accept” Ho).
Finally, in practice, we should always consider the practical significance of the results as well as the statistical significance.
What’s next?
Before we move on to the next test, we are going to use the z-test for proportions to bring up and illustrate a few more very important issues regarding hypothesis testing. This might also be a good time to review the concepts of Type I error, Type II error, and Power before continuing on.