Proportions (Step 2)
- Step 2. Collect Data, Check Conditions, and Summarizing the Data
- The Four Steps in Hypothesis Testing
Step 2. Collect Data, Check Conditions, and Summarize Data
After the hypotheses have been stated, the next step is to obtain a sample (on which the inference will be based), collect relevant data, and summarize them.
It is extremely important that our sample is representative of the population about which we want to draw conclusions. This is ensured when the sample is chosen at random. Beyond the practical issue of ensuring representativeness, choosing a random sample has theoretical importance that we will mention later.
In the case of hypothesis testing for the population proportion (p), we will collect data on the relevant categorical variable from the individuals in the sample and start by calculating the sample proportion p-hat (the natural quantity to calculate when the parameter of interest is p).
Let’s go back to our three examples and add this step to our figures.
EXAMPLE:
Has the proportion of defective products been reduced as a result of the repair?
EXAMPLE:
Is the proportion of marijuana users in the college higher than the national figure?
EXAMPLE:
Did the proportion of U.S. adults who support the death penalty change between 2003 and a later poll?
As we mentioned earlier without going into details, when we summarize the data in hypothesis testing, we go a step beyond calculating the sample statistic and summarize the data with a test statistic. Every test has a test statistic, which to some degree captures the essence of the test. In fact, the p-value, which so far we have looked upon as “the king” (in the sense that everything is determined by it), is actually determined by (or derived from) the test statistic. We will now introduce the test statistic.
The test statistic is a measure of how far the sample proportion p-hat is from the null value p0, the value that the null hypothesis claims is the value of p. In other words, since p-hat is what the data estimates p to be, the test statistic can be viewed as a measure of the “distance” between what the data tells us about p and what the null hypothesis claims p to be.
Let’s use our examples to understand this:
EXAMPLE:
Has the proportion of defective products been reduced as a result of the repair?
The parameter of interest is p, the proportion of defective products following the repair.
The data estimate p to be p-hat = 0.16
The null hypothesis claims that p = 0.20
The data are therefore 0.04 (or 4 percentage points) below the null hypothesis value.
It is hard to evaluate whether this difference of 4% in defective products is enough evidence to say that the repair was effective at reducing the proportion of defective products, but clearly, the larger the difference, the more evidence it is against the null hypothesis. So if, for example, our sample proportion of defective products had been, say, 0.10 instead of 0.16, then I think you would all agree that cutting the proportion of defective products in half (from 20% to 10%) would be extremely strong evidence that the repair was effective at reducing the proportion of defective products.
EXAMPLE:
Is the proportion of marijuana users in the college higher than the national figure?
The parameter of interest is p, the proportion of students in a college who use marijuana.
The data estimate p to be p-hat = 0.19
The null hypothesis claims that p = 0.157
The data are therefore 0.033 (or 3.3. percentage points) above the null hypothesis value.
EXAMPLE:
Did the proportion of U.S. adults who support the death penalty change between 2003 and a later poll?
The parameter of interest is p, the proportion of U.S. adults who support the death penalty for convicted murderers.
The data estimate p to be p-hat = 0.675
The null hypothesis claims that p = 0.64
There is a difference of 0.035 (or 3.5. percentage points) between the data and the null hypothesis value.
The problem with looking only at the difference between the sample proportion, p-hat, and the null value, p0 is that we have not taken into account the variability of our estimator p-hat which, as we know from our study of sampling distributions, depends on the sample size.
For this reason, the test statistic cannot simply be the difference between p-hat and p0, but must be some form of that formula that accounts for the sample size. In other words, we need to somehow standardize the difference so that comparison between different situations will be possible. We are very close to revealing the test statistic, but before we construct it, let’s be reminded of the following two facts from probability:
Fact 1: When we take a random sample of size n from a population with population proportion p, then
Fact 2: The z-score of any normal value (a value that comes from a normal distribution) is calculated by finding the difference between the value and the mean and then dividing that difference by the standard deviation (of the normal distribution associated with the value). The z-score represents how many standard deviations below or above the mean the value is.
Thus, our test statistic should be a measure of how far the sample proportion p-hat is from the null value p0 relative to the variation of p-hat (as measured by the standard error of p-hat).
Recall that the standard error is the standard deviation of the sampling distribution for a given statistic. For p-hat, we know the following:
To find the p-value, we will need to determine how surprising our value is assuming the null hypothesis is true. We already have the tools needed for this process from our study of sampling distributions as represented in the table above.
EXAMPLE:
Has the proportion of defective products been reduced as a result of the repair?
If we assume the null hypothesis is true, we can specify that the center of the distribution of all possible values of p-hat from samples of size 400 would be 0.20 (our null value).
We can calculate the standard error, assuming p = 0.20 as
The following picture represents the sampling distribution of all possible values of p-hat of samples of size 400, assuming the true proportion p is 0.20 and our other requirements for the sampling distribution to be normal are met (we will review these during the next step).
In order to calculate probabilities for the picture above, we would need to find the z-score associated with our result.
This z-score is the test statistic! In this example, the numerator of our z-score is the difference between p-hat (0.16) and null value (0.20) which we found earlier to be -0.04. The denominator of our z-score is the standard error calculated above (0.02) and thus quickly we find the z-score, our test statistic, to be -2.
The sample proportion based upon this data is 2 standard errors below the null value.
Hopefully you now understand more about the reasons we need probability in statistics!!
Now we will formalize the definition and look at our remaining examples before moving on to the next step, which will be to determine if a normal distribution applies and calculate the p-value.
Test Statistic for Hypothesis Tests for One Proportion is:
It represents the difference between the sample proportion and the null value, measured in standard deviations (standard error of p-hat).
The picture above is a representation of the sampling distribution of p-hat assuming p = p0. In other words, this is a model of how p-hat behaves if we are drawing random samples from a population for which Ho is true.
Notice the center of the sampling distribution is at p0, which is the hypothesized proportion given in the null hypothesis (Ho: p = p0.) We could also mark the axis in standard error units,
For example, if our null hypothesis claims that the proportion of U.S. adults supporting the death penalty is 0.64, then the sampling distribution is drawn as if the null is true. We draw a normal distribution centered at 0.64 (p0) with a standard error dependent on sample size,
Important Comment:
- Note that under the assumption that Ho is true (and if the conditions for the sampling distribution to be normal are satisfied) the test statistic follows a N(0,1) (standard normal) distribution. Another way to say the same thing which is quite common is: “The null distribution of the test statistic is N(0,1).”
By “null distribution,” we mean the distribution under the assumption that Ho is true. As we’ll see and stress again later, the null distribution of the test statistic is what the calculation of the p-value is based on.
Let’s go back to our remaining two examples and find the test statistic in each case:
EXAMPLE:
Is the proportion of marijuana users in the college higher than the national figure?
Since the null hypothesis is Ho: p = 0.157, the standardized (z) score of p-hat = 0.19 is
This is the value of the test statistic for this example.
We interpret this to mean that, assuming that Ho is true, the sample proportion p-hat = 0.19 is 0.91 standard errors above the null value (0.157).
EXAMPLE:
Did the proportion of U.S. adults who support the death penalty change between 2003 and a later poll?
Since the null hypothesis is Ho: p = 0.64, the standardized (z) score of p-hat = 0.675 is
This is the value of the test statistic for this example.
We interpret this to mean that, assuming that Ho is true, the sample proportion p-hat = 0.675 is 2.31 standard errors above the null value (0.64).
Comments about the Test Statistic:
- We mentioned earlier that to some degree, the test statistic captures the essence of the test. In this case, the test statistic measures the difference between p-hat and p0 in standard errors. This is exactly what this test is about. Get data, and look at the discrepancy between what the data estimates p to be (represented by p-hat) and what Ho claims about p (represented by p0).
- You can think about this test statistic as a measure of evidence in the data against Ho. The larger the test statistic, the “further the data are from Ho” and therefore the more evidence the data provide against Ho.
Comments:
- It should now be clear why this test is commonly known as the z-test for the population proportion. The name comes from the fact that it is based on a test statistic that is a z-score.
- Recall fact 1 that we used for constructing the z-test statistic. Here is part of it again:
When we take a random sample of size n from a population with population proportion p0, the possible values of the sample proportion p-hat (when certain conditions are met) have approximately a normal distribution with a mean of p0… and a standard deviation of
This result provides the theoretical justification for constructing the test statistic the way we did, and therefore the assumptions under which this result holds (in bold, above) are the conditions that our data need to satisfy so that we can use this test. These two conditions are:
i. The sample has to be random.
ii. The conditions under which the sampling distribution of p-hat is normal are met. In other words:
- Here we will pause to say more about condition (i.) above, the need for a random sample. In the Probability Unit we discussed sampling plans based on probability (such as a simple random sample, cluster, or stratified sampling) that produce a non-biased sample, which can be safely used in order to make inferences about a population. We noted in the Probability Unit that, in practice, other (non-random) sampling techniques are sometimes used when random sampling is not feasible. It is important though, when these techniques are used, to be aware of the type of bias that they introduce, and thus the limitations of the conclusions that can be drawn from them. For our purpose here, we will focus on one such practice, the situation in which a sample is not really chosen randomly, but in the context of the categorical variable that is being studied, the sample is regarded as random. For example, say that you are interested in the proportion of students at a certain college who suffer from seasonal allergies. For that purpose, the students in a large engineering class could be considered as a random sample, since there is nothing about being in an engineering class that makes you more or less likely to suffer from seasonal allergies. Technically, the engineering class is a convenience sample, but it is treated as a random sample in the context of this categorical variable. On the other hand, if you are interested in the proportion of students in the college who have math anxiety, then the class of engineering students clearly could not possibly be viewed as a random sample, since engineering students probably have a much lower incidence of math anxiety than the college population overall.
Let’s check the conditions in our three examples.
EXAMPLE:
Has the proportion of defective products been reduced as a result of the repair?
i. The 400 products were chosen at random.
ii. n = 400, p0 = 0.2 and therefore:
EXAMPLE:
Is the proportion of marijuana users in the college higher than the national figure?
i. The 100 students were chosen at random.
ii. n = 100, p0 = 0.157 and therefore:
EXAMPLE:
Did the proportion of U.S. adults who support the death penalty change between 2003 and a later poll?
i. The 1000 adults were chosen at random.
ii. n = 1000, p0 = 0.64 and therefore:
Checking that our data satisfy the conditions under which the test can be reliably used is a very important part of the hypothesis testing process. Be sure to consider this for every hypothesis test you conduct in this course and certainly in practice.
The Four Steps in Hypothesis Testing
- STEP 1: State the appropriate null and alternative hypotheses, Ho and Ha.
- STEP 2: Obtain a random sample, collect relevant data, and check whether the data meet the conditions under which the test can be used. If the conditions are met, summarize the data using a test statistic.
- STEP 3: Find the p-value of the test.
- STEP 4: Based on the p-value, decide whether or not the results are statistically significant and draw your conclusions in context.
- Note: In practice, we should always consider the practical significance of the results as well as the statistical significance.
With respect to the z-test, the population proportion that we are currently discussing we have:
Step 1: Completed
Step 2: Completed
Step 3: This is what we will work on next.