- Slides 1-6
- Introduction
- Effect of Sample Size on Hypothesis Testing

- Slides 7 – 11
- Statistical Significance vs. Practical Importance

- Slides 12 – 17
- Using Confidence Intervals to Conduct Hypothesis Tests

- Slides 18 – 21
- What Confidence Intervals ADD to our analyses
- Summary

This document linked from More about Hypothesis Testing

]]>This document linked from Proportions (Step 4 & Summary)

]]>- Slides 1-7: Finding P-Values

**There is an error in the transcript for SLIDE 13: It says****“We can enter 2.5 in for “x” and select P(X > x) from the list to calculate a probability of 0.01044.”**

**It should read****“We can enter 2.31 in for “x” and select P(X > x) from the list to calculate a probability of 0.01044.”**

- Examples and Summary

This document linked from Proportions (Step 3)

]]>- Slides 1-10: Introduction and Test Statistics

- Slides 11-16: Check Conditions

This document linked from Proportions (Step 2)

]]>In 2007, a Gallup poll estimated that 45% of U.S. adults rated their financial situation as “good.” Is the proportion different for this year? Which of the following samples could be used to test the null hypothesis p = 0.45? Mark each as valid (OK to use to test the hypothesis) or not valid (should not be used to test the hypothesis).

We plan to poll 200 students enrolled in statistics at your college by distributing surveys during class. Which of the following hypotheses could be tested with the survey results? Mark each as valid (OK to use to test the hypothesis) or not valid (should not be used to test the hypothesis.)

This document is linked from Proportions (Step 2).

]]>This document is linked from Unit 3A: Probability

]]>- Covers activity illustrating random assignment to treatments

This document is linked from Causation and Experiments.

]]>The issues regarding hypothesis testing that we will discuss are:

- The effect of sample size on hypothesis testing.
- Statistical significance vs. practical importance.
- Hypothesis testing and confidence intervals—how are they related?

Let’s begin.

We have already seen the effect that the sample size has on inference, when we discussed point and interval estimation for the population mean (μ, mu) and population proportion (p). Intuitively …

Larger sample sizes give us more information to pin down the true nature of the population. We can therefore expect the **sample** mean and **sample **proportion obtained from a larger sample to be closer to the population mean and proportion, respectively. As a result, for the same level of confidence, we can report a smaller margin of error, and get a narrower confidence interval. What we’ve seen, then, is that larger sample size gives a boost to how much we trust our sample results.

In hypothesis testing, larger sample sizes have a similar effect. We have also discussed that the power of our test increases when the sample size increases, all else remaining the same. This means, we have a better chance to detect the difference between the true value and the null value for larger samples.

The following two examples will illustrate that a larger sample size provides more convincing evidence (the test has greater power), and how the evidence manifests itself in hypothesis testing. Let’s go back to our example 2 (marijuana use at a certain liberal arts college).

Is the proportion of marijuana users in the college higher than the national figure?

We do **not** have enough evidence to conclude that the proportion of students at the college who use marijuana is higher than the national figure.

**Now, let’s increase the sample size. **

There are rumors that students in a certain liberal arts college are more inclined to use drugs than U.S. college students in general. Suppose that **in a simple random sample of 400 students from the college, 76 admitted to marijuana use**. Do the data provide enough evidence to conclude that the proportion of marijuana users among the students in the college (p) is **higher** than the national proportion, which is 0.157? (Reported by the Harvard School of Public Health).

Our results here are statistically **significant**. In other words, in example 2* the data provide enough evidence to reject Ho.

**Conclusion:**There is enough evidence that the proportion of marijuana users at the college is higher than among all U.S. students.

What do we learn from this?

We see that sample results that are based on a larger sample carry more weight (have greater power).

In example 2, we saw that a sample proportion of 0.19 based on a sample of size of 100 was not enough evidence that the proportion of marijuana users in the college is higher than 0.157. Recall, from our general overview of hypothesis testing, that this conclusion (not having enough evidence to reject the null hypothesis) **doesn’t** mean the null hypothesis is necessarily true (so, we never “accept” the null); it only means that the particular study didn’t yield sufficient evidence to reject the null. It **might** be that the sample size was simply too small to detect a statistically significant difference.

However, in example 2*, we saw that when the sample proportion of 0.19 is obtained from a sample of size 400, it carries much more weight, and in particular, provides enough evidence that the proportion of marijuana users in the college is higher than 0.157 (the national figure). In **this** case, the sample size of 400 **was** large enough to detect a statistically significant difference.

The following activity will allow you to practice the ideas and terminology used in hypothesis testing when a result is not statistically significant.

Now, we will address the issue of statistical significance versus practical importance (which also involves issues of sample size).

The following activity will let you explore the effect of the sample size on the statistical significance of the results yourself, and more importantly will discuss issue **2: Statistical significance vs. practical importance.**

This suggests that when interpreting the results of a test, you should always think not only about the statistical significance of the results but also about their practical importance.

The last topic we want to discuss is the relationship between hypothesis testing and confidence intervals. Even though the flavor of these two forms of inference is different (confidence intervals estimate a parameter, and hypothesis testing assesses the evidence in the data against one claim and in favor of another), there is a strong link between them.

We will explain this link (using the z-test and confidence interval for the population proportion), and then explain how confidence intervals can be used after a test has been carried out.

Recall that a confidence interval gives us a set of plausible values for the unknown population parameter. We may therefore examine a confidence interval to informally decide if a proposed value of population proportion seems plausible.

For example, if a 95% confidence interval for p, the proportion of all U.S. adults already familiar with Viagra in May 1998, was (0.61, 0.67), then it seems clear that we should be able to reject a claim that only 50% of all U.S. adults were familiar with the drug, since based on the confidence interval, 0.50 is not one of the plausible values for p.

In fact, the information provided by a confidence interval can be formally related to the information provided by a hypothesis test. (**Comment:** The relationship is more straightforward for two-sided alternatives, and so we will not present results for the one-sided cases.)

Suppose we want to carry out the **two-sided test:**

- Ho: p = p
_{0} - Ha: p ≠ p
_{0}

using a significance level of 0.05.

An alternative way to perform this test is to find a 95% **confidence interval** for p and check:

- If p
_{0}falls**outside**the confidence interval,**reject**Ho. - If p
_{0}falls**inside**the confidence interval,**do not reject**Ho.

In other words,

- If p
_{0}is not one of the plausible values for p, we reject Ho. - If p
_{0}is a plausible value for p, we cannot reject Ho.

(**Comment:** Similarly, the results of a test using a significance level of 0.01 can be related to the 99% confidence interval.)

Let’s look at an example:

Recall example 3, where we wanted to know whether the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003, when it was 0.64.

We are testing:

**Ho:**p = 0.64 (No change from 2003).**Ha:**p ≠ 0.64 (Some change since 2003).

and as the figure reminds us, we took a sample of 1,000 U.S. adults, and the data told us that 675 supported the death penalty for convicted murderers (p-hat = 0.675).

A 95% confidence interval for p, the proportion of **all** U.S. adults who support the death penalty, is:

Since the 95% confidence interval for p does not include 0.64 as a plausible value for p, we can reject Ho and conclude (as we did before) that there is enough evidence that the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003.

You and your roommate are arguing about whose turn it is to clean the apartment. Your roommate suggests that you settle this by tossing a coin and takes one out of a locked box he has on the shelf. Suspecting that the coin might not be fair, you decide to test it first. You toss the coin 80 times, thinking to yourself that if, indeed, the coin is fair, you should get around 40 heads. Instead you get 48 heads. You are puzzled. You are not sure whether getting 48 heads out of 80 is enough evidence to conclude that the coin is unbalanced, or whether this a result that could have happened just by chance when the coin is fair.

Statistics can help you answer this question.

Let p be the true proportion (probability) of heads. We want to test whether the coin is fair or not.

We are testing:

**Ho:**p = 0.5 (the coin is fair).**Ha:**p ≠ 0.5 (the coin is not fair).

The data we have are that out of n = 80 tosses, we got 48 heads, or that the sample proportion of heads is p-hat = 48/80 = 0.6.

A 95% confidence interval for p, the true proportion of heads for this coin, is:

Since in this case 0.5 is one of the plausible values for p, we cannot reject Ho. In other words, the data do not provide enough evidence to conclude that the coin is not fair.

**Comment**

The context of the last example is a good opportunity to bring up an important point that was discussed earlier.

Even though we use 0.05 as a cutoff to guide our decision about whether the results are statistically significant, we should not treat it as inviolable and we should always add our own judgment. Let’s look at the last example again.

It turns out that the p-value of this test is 0.0734. In other words, it is maybe not extremely unlikely, but it is quite unlikely (probability of 0.0734) that when you toss a fair coin 80 times you’ll get a sample proportion of heads of 48/80 = 0.6 (or even more extreme). It is true that using the 0.05 significance level (cutoff), 0.0734 is not considered small enough to conclude that the coin is not fair. However, if you really don’t want to clean the apartment, the p-value might be small enough for you to ask your roommate to use a different coin, or to provide one yourself!

**Here is our final point on this subject:**

When the data provide enough evidence to reject Ho, we can conclude (depending on the alternative hypothesis) that the population proportion is either less than, greater than, or not equal to the null value p_{0}. However, we do not get a more informative statement about its actual value. It might be of interest, then, to follow the test with a 95% confidence interval that will give us more insight into the actual value of p.

In our example 3,

we concluded that the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003, when it was 0.64. It is probably of interest not only to know that the proportion has changed, but also to estimate what it has changed to. We’ve calculated the 95% confidence interval for p on the previous page and found that it is (0.646, 0.704).

We can combine our conclusions from the test and the confidence interval and say:

Data provide evidence that the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003, and we are 95% confident that it is now between 0.646 and 0.704. (i.e. between 64.6% and 70.4%).

Let’s look at our example 1 to see how a confidence interval following a test might be insightful in a different way.

Here is a summary of example 1:

We conclude that as a result of the repair, the proportion of defective products has been reduced to below 0.20 (which was the proportion prior to the repair). It is probably of great interest to the company not only to know that the proportion of defective has been reduced, but also estimate what it has been reduced to, to get a better sense of how effective the repair was. A 95% confidence interval for p in this case is:

We can therefore say that the data provide evidence that the proportion of defective products has been reduced, and we are 95% confident that it has been reduced to somewhere between 12.4% and 19.6%. This is very useful information, since it tells us that even though the results were significant (i.e., the repair reduced the number of defective products), the repair might not have been effective enough, if it managed to reduce the number of defective products only to the range provided by the confidence interval. This, of course, ties back in to the idea of statistical significance vs. practical importance that we discussed earlier. Even though the results are statistically significant (Ho was rejected), practically speaking, the repair might still be considered ineffective.

Even though this portion of the current section is about the z-test for population proportion, it is loaded with very important ideas that apply to hypothesis testing in general. We’ve already summarized the details that are specific to the z-test for proportions, so the purpose of this summary is to highlight the general ideas.

The process of hypothesis testing has **four steps**:

**I. Stating the null and alternative hypotheses (Ho and Ha).**

**II. Obtaining a random sample (or at least one that can be considered random) and collecting data. Using the data:**

**Check that the conditions** under which the test can be reliably used are met.

**Summarize the data using a test statistic. **

- The test statistic is a measure of the evidence in the data against Ho. The larger the test statistic is in magnitude, the more evidence the data present against Ho.

**III. Finding the p-value of the test. **The p-value is the probability of getting data like those observed (or even more extreme) assuming that the null hypothesis is true, and is calculated using the null distribution of the test statistic. The p-value is a measure of the evidence against Ho. The smaller the p-value, the more evidence the data present against Ho.

**IV. Making conclusions. **

Conclusions about the statistical **significance of the results:**

If the p-value is small, the data present enough evidence to reject Ho (and accept Ha).

If the p-value is not small, the data do not provide enough evidence to reject Ho.

To help guide our decision, we use the significance level as a cutoff for what is considered a small p-value. The significance cutoff is usually set at 0.05.

Conclusions should then be provided **in the context** of the problem.

**Additional Important Ideas about Hypothesis Testing**

- Results that are based on a larger sample carry more weight, and therefore
**as the sample size increases, results become more statistically significant.**

- Even a very small and practically unimportant effect becomes statistically significant with a large enough sample size. The
**distinction between statistical significance and practical importance**should therefore always be considered.

**Confidence intervals can be used in order to carry out two-sided tests**(95% confidence for the 0.05 significance level). If the null value is not included in the confidence interval (i.e., is not one of the plausible values for the parameter), we have enough evidence to reject Ho. Otherwise, we cannot reject Ho.

- If the results are statistically significant, it might be of interest to
**follow up the tests with a confidence interval**in order to get insight into the actual value of the parameter of interest.

- It is important to be aware that there are two types of errors in hypothesis testing (
**Type I and Type II**) and that the**power**of a statistical test is an important measure of how likely we are to be able to detect a difference of interest to us in a particular problem.

This last part of the four-step process of hypothesis testing is the same across all statistical tests, and actually, we’ve already said basically everything there is to say about it, but it can’t hurt to say it again.

The p-value is a measure of how much evidence the data present against Ho. The smaller the p-value, the more evidence the data present against Ho.

We already mentioned that what determines what constitutes enough evidence against Ho is the significance level (α, alpha), a cutoff point below which the p-value is considered small enough to reject Ho in favor of Ha. The most commonly used significance level is 0.05.

- If p-value ≤ 0.05 then
**WE REJECT**Ho- Conclusion: There
**IS**enough evidence that*Ha is True*

- Conclusion: There
- If p-value > 0.05 then
**WE FAIL TO REJECT**Ho- Conclusion: There
**IS NOT**enough evidence that*Ha is True*

- Conclusion: There

Where instead of __ Ha is True__, we write what this means in the words of the problem, in other words, in the context of the current scenario.

It is important to mention again that this step has essentially two sub-steps:

- (i) Based on the p-value, determine whether or not the results are statistically significant (i.e., the data present enough evidence to reject Ho).
- (ii) State your conclusions in the context of the problem.

**Note:** We always still must consider whether the results have any practical significance, particularly if they are statistically significant as a statistically significant result which has not practical use is essentially meaningless!

Let’s go back to our three examples and draw conclusions.

Has the proportion of defective products been reduced as a result of the repair?

We found that the p-value for this test was 0.023.

Since 0.023 is small (in particular, 0.023 < 0.05), the data provide enough evidence to reject Ho.

**Conclusion:**

- There
**IS**enough evidence that.*the proportion of defective products is less than 20% after the repair*

The following figure is the complete story of this example, and includes all the steps we went through, starting from stating the hypotheses and ending with our conclusions:

Is the proportion of marijuana users in the college higher than the national figure?

We found that the p-value for this test was 0.182.

Since .182 is *not* small (in particular, 0.182 > 0.05), the data do not provide enough evidence to reject Ho.

**Conclusion:**

- There
**IS NOT**enough evidence that*the proportion of students at the college who use marijuana is higher than the national figure.*

Here is the complete story of this example:

Did the proportion of U.S. adults who support the death penalty change between 2003 and a later poll?

We found that the p-value for this test was 0.021.

Since 0.021 is small (in particular, 0.021 < 0.05), the data provide enough evidence to reject Ho

**Conclusion:**

- There
**IS**enough evidence that*the**proportion of adults who support the death penalty for convicted murderers has changed since 2003.*

Here is the complete story of this example:

Many students wonder why 5% is often selected as the significance level in hypothesis testing, and why 1% is the next most typical level. This is largely due to just convenience and tradition.

When Ronald Fisher (one of the founders of modern statistics) published one of his tables, he used a mathematically convenient scale that included 5% and 1%. Later, these same 5% and 1% levels were used by other people, in part just because Fisher was so highly esteemed. But mostly these are arbitrary levels.

The idea of selecting some sort of relatively small cutoff was historically important in the development of statistics; but it’s important to remember that there is really a continuous range of increasing confidence towards the alternative hypothesis, not a single all-or-nothing value. There isn’t much meaningful difference, for instance, between a p-value of .049 or .051, and it would be foolish to declare one case definitely a “real” effect and to declare the other case definitely a “random” effect. In either case, the study results were roughly 5% likely by chance if there’s no actual effect.

Whether such a p-value is sufficient for us to reject a particular null hypothesis ultimately depends on the risk of making the wrong decision, and the extent to which the hypothesized effect might contradict our prior experience or previous studies.

We have now completed going through the four steps of hypothesis testing, and in particular we learned how they are applied to the z-test for the population proportion. Here is a brief summary:

**Step 1: State the hypotheses**

State the null hypothesis:

Ho: p = p_{0}

State the alternative hypothesis:

Ha: p < p_{0} **(one-sided)**

Ha: p > p_{0} **(one-sided)**

Ha: p ≠ p_{0} **(two-sided)**

where the choice of the appropriate alternative (out of the three) is usually quite clear from the context of the problem. If you feel it is not clear, it is most likely a two-sided problem. Students are usually good at recognizing the “more than” and “less than” terminology but differences can sometimes be more difficult to spot, sometimes this is because you have preconceived ideas of how you think it should be! Use only the information given in the problem.

**Step 2: Obtain data, check conditions, and summarize data**

Obtain data from a sample and:

(i) Check whether the data satisfy the conditions which allow you to use this test.

random sample (or at least a sample that can be considered random in context)

the conditions under which the sampling distribution of p-hat is normal are met

(ii) Calculate the sample proportion p-hat, and summarize the data using the test statistic:

(**Recall:** This standardized test statistic represents how many standard deviations above or below p_{0} our sample proportion p-hat is.)

**Step 3: Find the p-value of the test by using the test statistic as follows**

**When the alternative hypothesis is “less than” **the probability of observing a test statistic as **small as that observed or smaller**, assuming that the values of the test statistic follow a standard normal distribution. We will now represent this probability in symbols and also using the normal distribution.

Looking at the shaded region, you can see why this is often referred to as a **left-tailed** test. We shaded to the left of the test statistic, since less than is to the left.

**When the alternative hypothesis is “greater than”** the probability of observing a test statistic as **large as that observed or larger**, assuming that the values of the test statistic follow a standard normal distribution. Again, we will represent this probability in symbols and using the normal distribution

Looking at the shaded region, you can see why this is often referred to as a **right-tailed** test. We shaded to the right of the test statistic, since greater than is to the right.

**When the alternative hypothesis is “not equal to” **the probability of observing a test statistic which is as large in **magnitude** as that observed or larger, assuming that the values of the test statistic follow a standard normal distribution.

This is often referred to as a **two-tailed** test, since we shaded in both directions.

**Step 4: Conclusion**

Reach a conclusion first regarding the statistical significance of the results, and then determine what it means in the context of the problem.

**If p-value ≤ 0.05 then WE REJECT Ho**

**Conclusion: There IS enough evidence that Ha is True**

**If p-value > 0.05 then WE FAIL TO REJECT Ho**

**Conclusion: There IS NOT enough evidence that Ha is True**

Recall that: If the p-value is small (in particular, smaller than the significance level, which is usually 0.05), the results are statistically significant (in the sense that there is a statistically significant difference between what was observed in the sample and what was claimed in Ho), and so we reject Ho.

If the p-value is not small, we do not have enough statistical evidence to reject Ho, and so we continue to believe that Ho **may** be true. (**Remember: In hypothesis testing we never “accept” Ho**).

Finally, in practice, we should always consider the **practical significance** of the results as well as the statistical significance.

Before we move on to the next test, we are going to use the z-test for proportions to bring up and illustrate a few more very important issues regarding hypothesis testing. This might also be a good time to review the concepts of Type I error, Type II error, and Power before continuing on.

]]>So far we’ve talked about the p-value at the intuitive level: understanding what it is (or what it measures) and how we use it to draw conclusions about the statistical significance of our results. We will now go more deeply into how the p-value is calculated.

It should be mentioned that eventually we will rely on technology to calculate the p-value for us (as well as the test statistic), but in order to make intelligent use of the output, it is important to first **understand** the details, and only then let the computer do the calculations for us. Again, our goal is to use this simple example to give you the tools you need to understand the process entirely. Let’s start.

Recall that so far we have said that the p-value is the probability of obtaining data like those observed assuming that Ho is true. Like the test statistic, the p-value is, therefore, a measure of the evidence against Ho. In the case of the **test statistic,** the **larger** it is in magnitude (positive or negative), the further p-hat is from p_{0}, the **more evidence we have against Ho. **In the case of the **p-value**, it is the opposite; the **smaller** it is, the more unlikely it is to get data like those observed when Ho is true, the **more evidence it is against Ho**. One can actually draw conclusions in hypothesis testing just using the test statistic, and as we’ll see the p-value is, in a sense, just another way of looking at the test statistic. The reason that we actually take the extra step in this course and derive the p-value from the test statistic is that even though in this case (the test about the population proportion) and some other tests, the value of the test statistic has a very clear and intuitive interpretation, there are some tests where its value is not as easy to interpret. On the other hand, the p-value keeps its intuitive appeal across **all** statistical tests.

**How is the p-value calculated?**

Intuitively, the p-value is the **probability** of observing **data like those observed** assuming that Ho is true. Let’s be a bit more formal:

- Since this is a probability question about the
**data**, it makes sense that the calculation will involve the data summary, the**test statistic.** - What do we mean by
**“like”**those observed? By “like” we mean**“as extreme or even more extreme.”**

Putting it all together, we get that in **general:**

The** p-value **is the** probability of observing a test statistic as extreme as that observed (or even more extreme) assuming that the null hypothesis is true.**

By **“extreme”** we mean extreme **in the direction(s) of the alternative** hypothesis.

**Specifically**, for the z-test for the population proportion:

- If the alternative hypothesis is Ha: p < p
_{0}**(less than)**, then “extreme” means**small or less than**, and the p-value is: The probability of observing a test statistic**as small as that observed or smaller**if the null hypothesis is true. - If the alternative hypothesis is Ha: p > p
_{0}**(greater than)**, then “extreme” means**large or greater than**, and the p-value is: The probability of observing a test statistic**as large as that observed or larger**if the null hypothesis is true. - If the alternative is Ha: p ≠ p
_{0}**(different from)**, then “extreme” means extreme in either direction**either small or large (i.e., large in magnitude) or just different from**, and the p-value therefore is: The probability of observing a test statistic**as large in magnitude as that observed or larger**if the null hypothesis is true.(Examples: If z = -2.5: p-value = probability of observing a test statistic as small as -2.5 or smaller or as large as 2.5 or larger. If z = 1.5: p-value = probability of observing a test statistic as large as 1.5 or larger, or as small as -1.5 or smaller.)

**OK, hopefully that makes (some) sense. But how do we actually calculate it?**

Recall the important comment from our discussion about our test statistic,

which said that when the null hypothesis is true (i.e., when p = p_{0}), the possible values of our test statistic follow a standard normal (N(0,1), denoted by Z) distribution. Therefore, the p-value calculations (which assume that Ho is true) are simply standard normal distribution calculations for the 3 possible alternative hypotheses.

The probability of observing a test statistic as **small as that observed or smaller**, assuming that the values of the test statistic follow a standard normal distribution. We will now represent this probability in symbols and also using the normal distribution.

Looking at the shaded region, you can see why this is often referred to as a **left-tailed** test. We shaded to the left of the test statistic, since less than is to the left.

The probability of observing a test statistic as **large as that observed or larger**, assuming that the values of the test statistic follow a standard normal distribution. Again, we will represent this probability in symbols and using the normal distribution

Looking at the shaded region, you can see why this is often referred to as a **right-tailed** test. We shaded to the right of the test statistic, since greater than is to the right.

The probability of observing a test statistic which is as large in **magnitude** as that observed or larger, assuming that the values of the test statistic follow a standard normal distribution.

This is often referred to as a **two-tailed** test, since we shaded in both directions.

Next, we will apply this to our three examples. But first, work through the following activities, which should help your understanding.

Has the proportion of defective products been reduced as a result of the repair?

The p-value in this case is:

- The probability of observing a test statistic as small as -2 or smaller, assuming that Ho is true.

**OR (recalling what the test statistic actually means in this case),**

- The probability of observing a sample proportion that is 2 standard deviations or more below the null value (p
_{0}= 0.20), assuming that p_{0}is the true population proportion.

**OR, more specifically,**

- The probability of observing a sample proportion of 0.16 or lower in a random sample of size 400, when the true population proportion is p
_{0}=0.20

In either case, the p-value is found as shown in the following figure:

To find P(Z ≤ -2) we can either use the calculator or table we learned to use in the probability unit for normal random variables. Eventually, after we understand the details, we will use software to run the test for us and the output will give us all the information we need. The p-value that the statistical software provides for this specific example is 0.023. The p-value tells us that it is pretty unlikely (probability of 0.023) to get data like those observed (test statistic of -2 or less) assuming that Ho is true.

Is the proportion of marijuana users in the college higher than the national figure?

The p-value in this case is:

- The probability of observing a test statistic as large as 0.91 or larger, assuming that Ho is true.

**OR (recalling what the test statistic actually means in this case),**

- The probability of observing a sample proportion that is 0.91 standard deviations or more above the null value (p
_{0}= 0.157), assuming that p_{0}is the true population proportion.

**OR, more specifically,**

- The probability of observing a sample proportion of 0.19 or higher in a random sample of size 100, when the true population proportion is p
_{0}=0.157

In either case, the p-value is found as shown in the following figure:

Again, at this point we can either use the calculator or table to find that the p-value is 0.182, this is P(Z ≥ 0.91).

The p-value tells us that it is not very surprising (probability of 0.182) to get data like those observed (which yield a test statistic of 0.91 or higher) assuming that the null hypothesis is true.

Did the proportion of U.S. adults who support the death penalty change between 2003 and a later poll?

The p-value in this case is:

- The probability of observing a test statistic as large as 2.31 (or larger) or as small as -2.31 (or smaller), assuming that Ho is true.

**OR (recalling what the test statistic actually means in this case),**

- The probability of observing a sample proportion that is 2.31 standard deviations or more away from the null value (p
_{0}= 0.64), assuming that p_{0}is the true population proportion.

**OR, more specifically,**

- The probability of observing a sample proportion as different as 0.675 is from 0.64, or even more different (i.e. as high as 0.675 or higher or as low as 0.605 or lower) in a random sample of size 1,000, when the true population proportion is p
_{0}= 0.64

In either case, the p-value is found as shown in the following figure:

Again, at this point we can either use the calculator or table to find that the p-value is 0.021, this is P(Z ≤ -2.31) + P(Z ≥ 2.31) = 2*P(Z ≥ |2.31|)

The p-value tells us that it is pretty unlikely (probability of 0.021) to get data like those observed (test statistic as high as 2.31 or higher or as low as -2.31 or lower) assuming that Ho is true.

**Comment:**

- We’ve just seen that finding p-values involves probability calculations about the value of the test statistic assuming that Ho is true. In this case, when Ho is true, the values of the test statistic follow a standard normal distribution (i.e., the sampling distribution of the test statistic when the null hypothesis is true is N(0,1)). Therefore, p-values correspond to areas (probabilities) under the standard normal curve.

Similarly, in **any test**, p-values are found using the sampling distribution of the test statistic when the null hypothesis is true (also known as the “null distribution” of the test statistic). In this case, it was relatively easy to argue that the null distribution of our test statistic is N(0,1). As we’ll see, in other tests, other distributions come up (like the t-distribution and the F-distribution), which we will just mention briefly, and rely heavily on the output of our statistical package for obtaining the p-values.

We’ve just completed our discussion about the p-value, and how it is calculated both in general and more specifically for the z-test for the population proportion. Let’s go back to the four-step process of hypothesis testing and see what we’ve covered and what still needs to be discussed.

**STEP 1:**State the appropriate null and alternative hypotheses, Ho and Ha.

**STEP 2:**Obtain a random sample, collect relevant data, and**check whether the data meet the conditions under which the test can be used**. If the conditions are met, summarize the data using a test statistic.

**STEP 3:**Find the p-value of the test.

**STEP 4:**Based on the p-value, decide whether or not the results are statistically significant and**draw your conclusions in context.**

**Note:**In practice, we should always consider the practical significance of the results as well as the statistical significance.

With respect to the z-test the population proportion:

Step 1: Completed

Step 2: Completed

Step 3: Completed

Step 4. This is what we will work on next.