View Lecture Slides with Transcript – Part E of Course Summary
Transcript – Unit 4B Case CQ More than Two Independent Samples A
Transcript – Unit 4B Case CQ More than Two Independent Samples B
This document linked from k > 2 Independent Samples
]]>View Lecture Slides with Transcript – Part D of Course Summary – All Parts
Transcript – Unit 4B Case CQ Two Independent Samples A
Transcript – Unit 4B Case CQ Two Independent Samples B
Transcript – Unit 4B Case CQ Two Independent Samples D
Transcript – Unit 4B Case CQ Two Independent Samples E
This document linked from Two Independent Samples
]]>View Lecture Slides with Transcript – Part C of Course Summary
Transcript – Unit 4B Case CQ Paired Samples A
Transcript – Unit 4B Case CQ Paired Samples C
This document is linked from Paired Samples
]]>This document linked from Case C→Q
]]>This document linked from Unit 4B: Inference for Relationships
]]>This document is linked from Case CQ.
]]>This document is linked from RoleType Classification.
]]>This short video elaborates upon the information displayed in a boxplot.
The original slides are not available.
This document is linked from Boxplots.
]]>We are now done with case C→Q.
The following table summarizes when each of the three standard tests, covered in this module, are used:
The following summary discusses each of the above named subcases of C→Q within the context of the hypothesis testing process.
We need to check that the conditions under which the test can be reliably used are met.
For the Paired ttest (as a special case of a onesample ttest), the conditions are:
For the TwoSample ttest, the conditions are:
For an ANOVA, the conditions are:
Now we summarize the data using a test statistic.
For the Paired ttest the test statistic is:
For the TwoSample ttest assuming equal variances the test statistic is:
where
For the TwoSample ttest assuming unequal variances the test statistic is:
For an ANOVA the test statistic is:
Use statistical software to determine the pvalue.
The pvalues for three C→Q tests are obtained from the output.
Conclusions about the significance of the results:
Conclusions should always be stated in the context of the problem and can all be written in the basic form below:
Related SAS Tutorials
Related SPSS Tutorials
In this part, we continue to handle situations involving one categorical explanatory variable and one quantitative response variable, which is case C→Q.
Here is a summary of the tests we have covered for the case where k = 2. Methods in BOLD are our main focus in this unit.
So far we have discussed the two samples and matched pairs designs, in which the categorical explanatory variable is twovalued. As we saw, in these cases, examining the relationship between the explanatory and the response variables amounts to comparing the mean of the response variable (Y) in two populations, which are defined by the two values of the explanatory variable (X). The difference between the two samples and matched pairs designs is that in the former, the two samples are independent, and in the latter, the samples are dependent.
Independent Samples (More Emphasis) 
Dependent Samples (Less Emphasis) 
Standard Tests
NonParametric Test

Standard Test
NonParametric Tests

We now move on to the case where k > 2 when we have independent samples. Here is a summary of the tests we will learn for the case where k > 2. Notice we will not cover the dependent samples case in this course.
Independent Samples (Only Emphasis) 
Dependent Samples (Not Discussed) 
Standard Tests
NonParametric Test

Standard Test

Here, as in the twovalued case, making inferences about the relationship between the explanatory (X) and the response (Y) variables amounts to comparing the means of the response variable in the populations defined by the values of the explanatory variable, where the number of means we are comparing depends, of course, on the number of values of X.
Unlike the twovalued case, where we looked at two subcases (1) when the samples are independent (two samples design) and (2) when the samples are dependent (matched pairs design, here, we are just going to discuss the case where the samples are independent. In other words, we are just going to extend the two samples design to more than two independent samples.
The inferential method for comparing more than two means that we will introduce in this part is called ANalysis Of VAriance (abbreviated as ANOVA), and the test associated with this method is called the ANOVA Ftest.
In most software, the data need to be arranged so that each row contains one observation with one variable recording X and another variable recording Y for each observation.
As we mentioned earlier, the test that we will present is called the ANOVA Ftest, and as you’ll see, this test is different in two ways from all the tests we have presented so far:
but a different structure that captures the essence of the Ftest, and clarifies where the name “analysis of variance” is coming from.
The question we need to answer is: Are the differences among the sample means due to true differences among the μ’s (alternative hypothesis), or merely due to sampling variability or random chance (null hypothesis)?
Here are two sets of boxplots representing two possible scenarios:
Scenario #1
Scenario #2
Thus, in the language of hypothesis tests, we would say that if the data were configured as they are in scenario 1, we would not reject the null hypothesis that population means were equal for the k groups.
If the data were configured as they are in scenario 2, we would reject the null hypothesis, and we would conclude that not all population means are the same for the k groups.
Let’s summarize what we learned from this.
In order to answer this question using data, we need to look at the variation among the sample means, but this alone is not enough.
We need to look at the variation among the sample means relative to the variation within the groups. In other words, we need to look at the quantity:
which measures to what extent the difference among the sample means for our groups dominates over the usual variation within sampled groups (which reflects differences in individuals that are typical in random samples).
When the variation within groups is large (like in scenario 1), the variation (differences) among the sample means may become negligible resulting in data which provide very little evidence against Ho. When the variation within groups is small (like in scenario 2), the variation among the sample means dominates over it, and the data have stronger evidence against Ho.
It has a different structure from all the test statistics we’ve looked at so far, but it is similar in that it is still a measure of the evidence against H_{0}. The larger F is (which happens when the denominator, the variation within groups, is small relative to the numerator, the variation among the sample means), the more evidence we have against H_{0}.
Looking at this ratio of variations is the idea behind the comparing more than two means; hence the name analysis of variance (ANOVA).
Now test your understanding of this idea.
Comments
Here is a full statement of the process for the ANOVA FTest:
Step 1: State the hypotheses
The null hypothesis claims that there is no relationship between X and Y. Since the relationship is examined by comparing the means of Y in the populations defined by the values of X (μ_{1}, μ_{2}, …, μ_{k}), no relationship would mean that all the means are equal.
Therefore the null hypothesis of the Ftest is:
As we mentioned earlier, here we have just one alternative hypothesis, which claims that there is a relationship between X and Y. In terms of the means μ_{1}, μ_{2}, …, μ_{k}, it simply says the opposite of the null hypothesis, that not all the means are equal, and we simply write:
Comments:
Step 2: Obtain data, check conditions, and summarize data
The ANOVA Ftest can be safely used as long as the following conditions are met:
(i) Each of the populations are normal, or more specifically, the distribution of the response Y in each population is normal, and the samples are random (or at least can be considered as such). In practice, checking normality in the populations is done by looking at each of the samples using a histogram and checking whether there are any signs that the populations are not normal. Such signs could be extreme skewness and/or extreme outliers.
(ii) The populations are known or discovered not to be normal, but the sample size of each of the random samples is large enough (we can use the rule of thumb that a sample size greater than 30 is considered large enough).
Can check this condition using the rule of thumb that the ratio between the largest sample standard deviation and the smallest is less than 2. If that is the case, this condition is considered to be satisfied.
Can check this condition using a formal test similar to that used in the twosample ttest although we will not cover any formal tests.
Test Statistic
Step 3: Find the pvalue of the test by using the test statistic as follows
Step 4: Conclusion
As usual, we base our conclusion on the pvalue.
Final Comment
Note that when we reject Ho in the ANOVA Ftest, all we can conclude is that
However, the ANOVA Ftest does not provide any immediate insight into why Ho was rejected, or in other words, it does not tell us in what way the population means of the groups are different. As an exploratory (or visual) aid to get that insight, we may take a look at the confidence intervals for group population means. More specifically, we can look at which of the confidence intervals overlap and which do not.
Multiple Comparisons:
Now let’s look at some examples using real data.
A college dean believes that students with different majors may experience different levels of academic frustration. Random samples of size 35 of Business, English, Mathematics, and Psychology majors are asked to rate their level of academic frustration on a scale of 1 (lowest) to 20 (highest).
The figure highlights what we have already mentioned: examining the relationship between major (X) and frustration level (Y) amounts to comparing the mean frustration levels among the four majors defined by X. Also, the figure reminds us that we are dealing with a case where the samples are independent.
Step 1: State the hypotheses
The correct hypotheses are:
Step 2: Obtain data, check conditions, and summarize data
Data: SPSS format, SAS format, Excel format, CSV format
In our example all the conditions are satisfied:
The rule of thumb is satisfied since 3.082 / 2.088 < 2. We will look at the formal test in the software.
Test statistic: (Minitab output)
Step 3: Find the pvalue of the test by using the test statistic as follows
Step 4: Conclusion
As a followup, we can construct confidence intervals (or conduct multiple comparisons as we will do in the software). This allows us to understand better which population means are likely to be different.
In this case, the business majors are clearly lower on the frustration scale than other majors. It is also possible that English majors are lower than psychology majors based upon the individual 95% confidence intervals in each group.
SAS Output and SAS Code (Includes NonParametric Test)
Here is another example
Do advertisers alter the reading level of their ads based on the target audience of the magazine they advertise in?
In 1981, a study of magazine advertisements was conducted (F.K. Shuptrine and D.D. McVicker, “Readability Levels of Magazine Ads,” Journal of Advertising Research, 21:5, October 1981). Researchers selected random samples of advertisements from each of three groups of magazines:
The measure that the researchers used to assess the level of the ads was the number of words in the ad. 18 ads were randomly selected from each of the magazine groups, and the number of words per ad were recorded.
The following figure summarizes this problem:
Our question of interest is whether the number of words in ads (Y) is related to the educational level of the magazine (X). To answer this question, we need to compare μ_{1}, μ_{2}, and μ_{3}, the mean number of words in ads of the three magazine groups. Note in the figure that the sample means are provided. It seems that what the data suggest makes sense; the magazines in group 1 have the largest number of words per ad (on average) followed by group 2, and then group 3.
The question is whether these differences between the sample means are significant. In other words, are the differences among the observed sample means due to true differences among the μ’s or merely due to sampling variability? To answer this question, we need to carry out the ANOVA Ftest.
Step 1: Stating the hypotheses.
We are testing:
Conceptually, the null hypothesis claims that the number of words in ads is not related to the educational level of the magazine, and the alternative hypothesis claims that there is a relationship.
Step 2: Checking conditions and summarizing the data.
In order to check the next two conditions, we’ll need to look at the data (condition ii), and calculate the sample standard deviations of the three samples (condition iii).
Using the above, we can address conditions (ii) and (iii)
Before we move on, let’s look again at the graph. It is easy to see the trend of the sample means (indicated by red circles).
However, there is so much variation within each of the groups that there is almost a complete overlap between the three boxplots, and the differences between the means are overshadowed and seem like something that could have happened just by chance.
Let’s move on and see whether the ANOVA Ftest will support this observation.
Step 3. Finding the pvalue.
Step 4: Making conclusions in context.
Now try one for yourself.
The ANOVA Ftest does not provide any insight into why H_{0} was rejected; it does not tell us in what way μ1,μ2,μ3…,μk are not all equal. We would like to know which pairs of ’s are not equal. As an exploratory (or visual) aid to get that insight, we may take a look at the confidence intervals for group population meansμ1,μ2,μ3…,μk that appears in the output. More specifically, we should look at the position of the confidence intervals and overlap/no overlap between them.
* If the confidence interval for, say,μi overlaps with the confidence interval for μj , then μi and μj share some plausible values, which means that based on the data we have no evidence that these two ’s are different.
* If the confidence interval for μi does not overlap with the confidence interval for μj , then μi and μj do not share plausible values, which means that the data suggest that these two ’s are different.
Furthermore, if like in the figure above the confidence interval (set of plausible values) for μi lies entirely below the confidence interval (set of plausible values) for μj, then the data suggest that μi is smaller than μj.
Consider our first example on the level of academic frustration.
Based on the small pvalue, we rejected H_{o} and concluded that not all four frustration level means are equal, or in other words that frustration level is related to the student’s major. To get more insight into that relationship, we can look at the confidence intervals above (marked in red). The top confidence interval is the set of plausible values for μ_{1}, the mean frustration level of business students. The confidence interval below it is the set of plausible values for μ_{2}, the mean frustration level of English students, etc.
What we see is that the business confidence interval is way below the other three (it doesn’t overlap with any of them). The math confidence interval overlaps with both the English and the psychology confidence intervals; however, there is no overlap between the English and psychology confidence intervals.
This gives us the impression that the mean frustration level of business students is lower than the mean in the other three majors. Within the other three majors, we get the impression that the mean frustration of math students may not differ much from the mean of both English and psychology students, however the mean frustration of English students may be lower than the mean of psychology students.
Note that this is only an exploratory/visual way of getting an impression of why H_{o} was rejected, not a formal one. There is a formal way of doing it that is called “multiple comparisons,” which is beyond the scope of this course. An extension to this course will include this topic in the future.
We will look at one nonparametric test in the k > 2 independent sample setting. We will cover more details later (Details for NonParametric Alternatives).
The KruskalWallis test is a general test to compare multiple distributions in independent samples and is a common alternative to the oneway ANOVA.
]]>