Related SAS Tutorials
Related SPSS Tutorials
The last procedures we studied (twosample t, paired t, ANOVA, and their nonparametric alternatives) all involve the relationship between a categorical explanatory variable and a quantitative response variable (case C→Q).In all of these procedures, the result is a comparison of the quantitative response variable (Y) among the groups defined by the categorical explanatory variable (X).The standard tests result in a comparison of the population means of Y within each group defined by X.
Next, we will consider inferences about the relationships between two categorical variables, corresponding to case C→C.
For case C→C, we will learn the following tests:
Independent Samples (Only Emphasis) 
Dependent Samples (Not Discussed) 
Standard Tests
NonParametric Test

Standard Test

In the Exploratory Data Analysis unit of the course, we summarized the relationship between two categorical variables for a given data set (using a twoway table and conditional percents), without trying to generalize beyond the sample data.
Now we will perform statistical inference for two categorical variables, using the sample data to draw conclusions about whether or not we have evidence that the variables are related in the larger population from which the sample was drawn.
In other words, we would like to assess whether the relationship between X and Y that we observed in the data is due to a real relationship between X and Y in the population, or if it is something that could have happened just by chance due to sampling variability.
Before moving into the statistical tests, let’s look at a few (fake) examples.
Suppose our explanatory variable X has r levels and our response variable Y has c levels. We usually arrange our table with the explanatory variable in the rows and the response variable in the columns.
Suppose we have the following partial (fake) data summarized in a twoway table using X = BMI category (r = 4 levels) and Y = Diabetes Status (c = 3 levels).
No Diabetes  PreDiabetes  Diabetes  Total  
Underweight  100  
Normal  400  
Overweight  300  
Obese  200  
Total  700  200  100  1000 
From our study of probability we can determine:
In the test we are going to use, our null hypothesis will be:
Ho: There is no relationship between X and Y.
Which in this case would be:
Ho: There is no relationship between BMI category (X) and diabetes status (Y).
If there were no relationship between X and Y, this would imply that the distribution of diabetes status is the same for each BMI category.
In this case (C→C), the distribution of diabetes status consists of the probability of each diabetes status group and the null hypothesis becomes:
Ho: BMI category (X) and diabetes status (Y) are INDEPENDENT.
Since the probability of “No Diabetes” is 0.7 in the entire dataset, if there were no differences in the distribution of diabetes status between BMI categories, we would obtain the same proportion in each row. Using the row totals we can find the EXPECTED counts as follows.
Notice the formula used below is simply the formula for the mean or expected value of a binomial random variable with n “trials” and probability of “success” p which was μ = E(X) = np where X = number of successes for a sample of size n.
No Diabetes  PreDiabetes  Diabetes  Total  
Underweight  100(0.7) = 70  100  
Normal  400(0.7) = 280  400  
Overweight  300(0.7) = 210  300  
Obese  200(0.7) = 140  200  
Total  700  200  100  1000 
Notice that these do indeed add to 700.
Similarly we can determine the EXPECTED counts for the remaining two columns since 20% of our sample were classified as having prediabetes and 10% were classified as having diabetes.
No Diabetes  PreDiabetes  Diabetes  Total  
Underweight  70  100(0.2) = 20  100(0.1) = 10  100 
Normal  280  400(0.2) = 80  400(0.1) = 40  400 
Overweight  210  300(0.2) = 60  300(0.1) = 30  300 
Obese  140  200(0.2) = 40  200(0.1) = 20  200 
Total  700  200  100  1000 
What we have created, using only the row totals, column totals, and column percents, is a table of what we would expect to happen if the null hypothesis of no relationship between X and Y were true. Here is the final result.
No Diabetes  PreDiabetes  Diabetes  Total  
Underweight  70  20  10  100 
Normal  280  80  40  400 
Overweight  210  60  60  300 
Obese  140  40  40  200 
Total  700  200  100  1000 
Suppose we gather data and find the following (expected counts are in parentheses for easy comparison):
No Diabetes  PreDiabetes  Diabetes  Total  
Underweight  65 (70)  22 (20)  13 (10)  100 
Normal  285 (280)  78 (80)  37 (40)  400 
Overweight  216 (210)  53 (60)  31 (30)  300 
Obese  134 (140)  47 (40)  19 (20)  200 
Total  700  200  100  1000 
If we compare our counts to the expected counts they are fairly close. This data would not give much evidence of a difference in the distribution of diabetes status among the levels of BMI categories. In other words, this data would not give much evidence of a relationship (or association) between BMI categories and diabetes status.
The standard test we will learn in case C→C is based upon comparing the OBSERVED cell counts (our data) to the EXPECTED cell counts (using the method discussed above).
We want you to see how the expected cell counts are created so that you will understand what kind of evidence is being used to reject the null hypothesis in case C→C.
Suppose instead that we gather data and we obtain the following counts (expected counts are in parentheses and row percentages are provided):
No Diabetes  PreDiabetes  Diabetes  Total  
Underweight  90 (70) 90% 
7 (20) 7% 
3 (10) 3% 
100 
Normal  340 (280) 85% 
40 (80) 10% 
20 (40) 5% 
400 
Overweight  180 (210) 60% 
90 (60) 30% 
30 (30) 10% 
300 
Obese  90 (140) 45% 
63 (40) 31.5% 
47 (20) 23.5% 
200 
Total  700  200  100  1000 
In this case, most of the differences are drastic and there seems to be clear evidence that the distribution of diabetes status is not the same among the four BMI categories.
Although this data is entirely fabricated, it illustrates the kind of evidence we need to reject the null hypothesis in case C→C.
One special case occurs when we have two categorical variables where both of these variables have two levels. Twolevel categorical variables are often called binary variables or dichotomous variables and when possible are usually coded as 1 for “Yes” or “Success” and 0 for “No” or “Failure.”
Here is another (fake) example.
Suppose we have the following partial (fake) data summarized in a twoway table using X = treatment and Y = significant improvement in symptoms.
No Improvement  Improvement  Total  
Control  100  
Treatment  100  
Total  120  80  200 
From our study of probability we can determine:
Since the probability of “No Improvement” is 0.6 in the entire dataset and the probability for “Improvement” is 0.4, if there was no difference we would obtain the same proportion in each row. Using the row totals we can find the EXPECTED counts as follows.
No Improvement  Improvement  Total  
Control  100(0.6) = 60 
100(0.4) = 40  100 
Treatment  100(0.6) = 60  100(0.4) = 40  100 
Total  120  80  200 
Suppose we obtain the following data:
No Improvement  Improvement  Total  
Control  80  20  100 
Treatment  40  60  100 
Total  120  80  200 
In this example we are interested in the probability of improvement and the above data seem to indicate the treatment provides a greater chance for improvement than the control.
We use this example to mention two ways of comparing probability (sometimes “risk”) in 2×2 tables. Many of you may remember these topics from Epidemiology or may see these topics again in Epidemiology courses in the future!
Risk Difference:
For this data, a larger proportion of subjects in the treatment group showed improvement compared to the control group. In fact, the estimated probability of improvement is 0.4 higher for the treatment group than the control group.
This value (0.4) is called a riskdifference and is one common measure in 2×2 tables. Estimates and confidence intervals can be obtained.
For a fixed sample size, the larger this difference, the more evidence against our null hypothesis (no relationship between X and Y).
The population riskdifference is often denoted p_{1} – p_{2}, and is the difference between two population proportions. We estimate these proportions in the same manner as Unit 1, once for each sample.
For the current example, we obtain
and
from which we find the risk difference
Odds Ratio:
Another common measure in 2×2 tables is the odds ratio, which is defined as the odds of the event occurring in one group divided by the odds of the event occurring in another group.
In this case, the odds of improvement in the treatment group is
and the odds of improvement in the control group is
so the odds ratio to compare the treatment group to the control group is
This value means that the odds of improvement are 6 times higher in the treatment group than in the control group.
Properties of Odds Ratios:
Step 1: State the hypothesesThe hypotheses are:
Ho: There is no relationship between the two categorical variables. (They are independent.)
Ha: There is a relationship between the two categorical variables. (They are not independent.)
Note: for 2×2 tables, these hypotheses can be formulated the same as for population means except using population proportions. This can be done for RxC tables as well but is not common as it requires more notation to compare multiple group proportions.
Step 2: Obtain data, check conditions, and summarize data
(i) The sample should be random with independent observations (all observations are independent of all other observations).
(ii) In general, the larger the sample, the more precise and reliable the test results are. There are different versions of what the conditions are that will ensure reliable use of the test, all of which involve the expected counts. One version of the conditions says that all expected counts need to be greater than 1, and at least 80% of expected counts need to be greater than 5. A more conservative version requires that all expected counts are larger than 5. Some software packages will provide a warning if the sample size is “too small.”
Test Statistic of the Chisquare Test for Independence:
The single number that summarizes the overall difference between observed and expected counts is the chisquare statistic, which tells us in a standardized way how far what we observed (data) is from what would be expected if Ho were true.
Here it is:
Step 3: Find the pvalue of the test by using the test statistic as followsWe will rely on software to obtain this value for us. We can also request the expected counts using software.
The pvalues are calculated using a chisquare distribution with (r1)(c1) degrees of freedom (where r = number of levels of the row variable and c = number of levels of the column variable). We will rely on software to obtain the pvalue for this test.
IMPORTANT NOTE
Step 4: Conclusion
As usual, we use the magnitude of the pvalue to draw our conclusions. A small pvalue indicates that the evidence provided by the data is strong enough to reject Ho and conclude (beyond a reasonable doubt) that the two variables are related. In particular, if a significance level of 0.05 is used, we will reject Ho if the pvalue is less than 0.05.
We will look at one nonparametric test in case C→C. Fisher’s exact test is an exact method of obtaining a pvalue for the hypotheses tested in a standard chisquare test for independence. This test is often used when the sample size requirement of the chisquare test is not satisfied and can be used for 2×2 and RxC tables.
Step 1: State the hypothesesThe hypotheses are:
Ho: There is no relationship between the two categorical variables. (They are independent.)
Ha: There is a relationship between the two categorical variables. (They are not independent, they are dependent.)
Step 2: Obtain data, check conditions, and summarize data
The sample should be random with independent observations (all observations are independent of all other observations).
Step 3: Find the pvalue of the test by using the test statistic as follows
The pvalues are calculated using a distribution specific to this test. We will rely on software to obtain the pvalue for this test. The pvalue measures the chance of obtaining a table as or more extreme (against the null hypothesis) than our table.
Step 4: Conclusion
As usual, we use the magnitude of the pvalue to draw our conclusions. A small pvalue indicates that the evidence provided by the data is strong enough to reject Ho and conclude (beyond a reasonable doubt) that the two variables are related. In particular, if a significance level of 0.05 is used, we will reject Ho if the pvalue is less than 0.05.
Now let’s look at a some examples with real data.
Low birth weight is an outcome of concern due to the fact that infant mortality rates and birth defect rates are very high for babies with low birth weight. A woman’s behavior during pregnancy (including diet, smoking habits, and obtaining prenatal care) can greatly alter her chances of carrying the baby to term and, consequently, of delivering a baby of normal birth weight.
In this example, we will use a 1986 study (Hosmer and Lemeshow (2000), Applied Logistic Regression: Second Edition) in which data were collected from 189 women (of whom 59 had low birth weight infants) at the Baystate Medical Center in Springfield, MA. The goal of the study was to identify risk factors associated with giving birth to a low birth weight baby.
Data: SPSS format, SAS format, Excel format
Response Variable:
Possible Explanatory Variables (variables we will use in this example are in bold):
Results:
Step 1: State the hypotheses
The hypotheses are:
Ho: There is no relationship between the categorical explanatory variable and presence of low birth weight. (They are independent.)
Ha: There is a relationship between the categorical explanatory variable and presence of low birth weight.(They are not independent, they are dependent.)
Steps 2 & 3: Obtain data, check conditions, summarize data, and find the pvalue
Explanatory Variable  Which Test is Appropriate?  Pvalue  Decision 
RACE  Min. Expected Count = 8.12 3×2 table Use Pearson Chisquare (since RxC) 
0.0819 (Chisquare – SAS) 0.082 (Chisquare – SPSS) 
Fail to Reject Ho 
SMOKE  Min. Expected Count = 23.1 2×2 table Use Continuity Correction (since 2×2) 
0.040 (Continuity Correction – SPSS) 0.0396 (Continuity Adj – SAS) 
Reject Ho 
PTL  Min. Expected Count = 0.31 4×2 table Fisher’s Exact test is more appropriate 
3.106 E04 = 0.0003106 (Fisher’s – SAS) 0.000 (Fisher’s – SPSS) 0.0008 (Chisquare – SAS) 0.001 (Chisquare – SPSS) 
Reject Ho 
HT  Min. Expected Count = 3.75 2×2 table Fisher’s Exact test may be more appropriate 
0.0516 (Fisher’s – SAS) 0.052 (Fisher’s – SPSS) 
Fail to Reject Ho (Barely) 
UI  Min. Expected Count = 8.74 2×2 table Use Continuity Correction 
0.0355 (Continuity Adj. – SAS) 0.035 (Continuity Correction – SPSS) 
Reject Ho 
Step 4: Conclusion
When considered individually, presence of uterine irritability, history of premature labor, and smoking during pregnancy are all significantly associated (pvalue < 0.05) with the presence/absence of a low birth weight infant whereas history of hypertension and race were only marginally significant (0.05 ≤ pvalue < 0.10).
Practical Significance:
Explanatory Variable  Comparison of Conditional Percentages of Low Birth Weight 
RACE  Race = White: 23.96% Race = Black: 42.31% Race = Other: 37.31% 
SMOKE  Smoke = No: 25.22% Smoke = Yes: 40.54% 
PTL  History of Premature Labor = 0: 25.79% History of Premature Labor = 1: 66.67% History of Premature Labor = 2: 40.00% (Note small sample size of 5 for this row) History of Premature Labor = 3: 0.00% (Note small sample size of 1 for this row) 
HT  Hypertension = No: 29.38% Hypertension = Yes: 58.33% (Note small sample size of 12 for this row) 
UI  Presence of uterine irritability = No: 27.95% Presence of uterine irritability = Yes: 50.00% 
If, instead of simply analyzing the “looks vs. personality” rating scale, we categorized the responses into groups then we would be in case C→C instead of case C→Q (see previous example in Case CQ for Two Independent Samples).
Recall the rating score was from 1 to 25 with 1 = personality most important (looks not important at all) and 25 = looks most important (personality not important at all). A score of 13 would be equally important and scores around 13 should indicate looks and personality are nearly equal in importance.
For our purposes we will use a rating of 16 or larger to indicate that looks were indeed more important than personality (by enough to matter).
Data: SPSS format, SAS format
Response Variable:
Results:
Step 1: State the hypotheses
The hypotheses are:
Ho: The proportion of college students who find looks more important than personality is the same for males and females. (The two variables are independent)
Ha: The proportion of college students who find looks more important than personality is different for males and females. (The two variables are dependent)
Steps 2 & 3: Obtain data, check conditions, summarize data, and find the pvalue
The minimum expected cell count is 13.38. This is a 2×2 table so we will use the continuity corrected chisquare statistic.
The pvalue is found to be 0.001 (SPSS) or 0.0007 (SAS).
Step 4: Conclusion
There is a significant association between gender and whether or not the individual rated looks more important than personality.
Among males, 27.1% rated looks higher than personality while among females this value was only 9.3%.
For fun: The odds ratio here is
which means, based upon our data, we estimate that the odds of rating looks more important than personality is 3.6 times higher among males than among females.
Practical Significance:
It seems clear that the difference between 27.1% and 9.3% is practically significant as well as statistically significant. This difference is large and likely represents a meaningful difference in the views of males and females regarding the importance of looks compared to personality.