# Two Independent Samples

This page is basically complete except that the PDF activies only have the non-interactive versions for now.
As we mentioned at the end of the Introduction to Unit 4B, we will focus only on two-sided tests for the remainder of this course. One-sided tests are often possible but rarely used in clinical research.
CO-4: Distinguish among different measurement scales, choose the appropriate descriptive and inferential statistical methods based on these distinctions, and interpret the results
LO 4.35: For a data analysis situation involving two variables, choose the appropriate inferential method for examining the relationship between the variables and justify the choice.
LO 4.36: For a data analysis situation involving two variables, carry out the appropriate inferential method for examining relationships between the variables and draw the correct conclusions in context.
CO-5: Determine preferred methodological alternatives to commonly used statistical methods when assumptions are not met.
REVIEW: Unit 1 Case C-Q
Video: Two Independent Samples (38:56)

Related SAS Tutorials

Related SPSS Tutorials

## Introduction

Here is a summary of the tests we will learn for the scenario  where k = 2. Methods in BOLD will be our main focus.

We have completed our discussion on dependent samples (2nd column) and now we move on to independent samples (1st column).

### Dependent Samples (Less Emphasis)

Standard Tests

• Two Sample T-Test Assuming Equal Variances
• Two Sample T-Test Assuming Unequal Variances

Non-Parametric Test

• Mann-Whitney U (or Wilcoxon Rank-Sum) Test
Standard Test

• Paired T-Test

Non-Parametric Tests

• Sign Test
• Wilcoxon Signed-Rank Test

## Dependent vs. Independent Samples

LO 4.37: Identify and distinguish between independent and dependent samples.

We have discussed the dependent sample case where observations are matched/paired/linked between the two samples. Recall that in that scenario observations can be the same individual or two individuals who are matched between samples. To analyze data from dependent samples, we simply took the differences and analyzed the difference using one-sample techniques. Now we will discuss the independent sample case. In this case, all individuals are independent of all other individuals in their sample as well as all individuals in the other sample. This is most often accomplished by either:

• Taking a random sample from each of the two groups under study. For example to compare heights of males and females, we could take a random sample of 100 females and another random sample of 100 males. The result would be two samples which are independent of each other.
• Taking a random sample from the entire population and then dividing it into two sub-samples based upon the grouping variable of interest. For example, we take a random sample of U.S. adults and then split them into two samples based upon gender. This results in a sub-sample of females and a sub-sample of males which are independent of each other. ## Comparing Two Means – Two Independent Samples T-test

LO 4.38: In a given context, determine the appropriate standard method for comparing groups and provide the correct conclusions given the appropriate software output.
LO 4.39: In a given context, set up the appropriate null and alternative hypotheses for comparing groups.

Recall that here we are interested in the effect of a two-valued (k = 2) categorical variable (X) on a quantitative response (Y). Random samples from the two sub-populations (defined by the two categories of X) are obtained and we need to evaluate whether or not the data provide enough evidence for us to believe that the two sub-population means are different.

In other words, our goal is to test whether the means μ1 and μ2 (which are the means of the variable of interest in the two sub-populations) are equal or not, and in order to do that we have two samples, one from each sub-population, which were chosen independently of each other.

The test that we will learn here is commonly known as the two-sample t-test. As the name suggests, this is a t-test, which as we know means that the p-values for this test are calculated under some t-distribution.

Here are figures that illustrate some of the examples we will cover. Notice how the original variables X (categorical variable with two levels) and Y (quantitative variable) are represented. Think about the fact that we are in case C → Q!

As in our discussion of dependent samples, we will often simplify our terminology and simply use the terms “population 1” and “population 2” instead of referring to these as sub-populations. Either terminology is fine.

## Many Students Wonder: Two Independent Samples

Question: Does it matter which population we label as population 1 and which as population 2?

Answer: No, it does not matter as long as you are consistent, meaning that you do not switch labels in the middle.

• BUT… considering how you label the populations is important in stating the hypotheses and in the interpretation of the results.  ## Steps for the Two-Sample T-test

Recall that our goal is to compare the means μ1 and μ2 based on the two independent samples. • Step 1: State the hypotheses

The hypotheses represent our goal to compare μ1and μ2.

The null hypothesis is always:

Ho: μ1 – μ2 = 0 (which is the same as μ1 = μ2)
(There IS NO association between the categorical explanatory variable and the quantitative response variable)

We will focus on the two-sided alternative hypothesis of the form:

Ha: μ1 – μ2 ≠ 0 (which is the same as μ1 ≠ μ2) (two-sided)
(There IS AN association between the categorical explanatory variable and the quantitative response variable)

Note that the null hypothesis claims that there is no difference between the means. Conceptually, Ho claims that there is no relationship between the two relevant variables (X and Y).

Our parameter of interest in this case (the parameter about which we are making an inference) is the difference between the means (μ1 – μ2) and the null value is 0. The alternative hypothesis claims that there is a difference between the means.

Did I Get This? What do our hypotheses mean in context?
• Step 2: Obtain data, check conditions, and summarize data

The two-sample t-test can be safely used as long as the following conditions are met:

The two samples are indeed independent.

We are in one of the following two scenarios:

(i) Both populations are normal, or more specifically, the distribution of the response Y in both populations is normal, and both samples are random (or at least can be considered as such). In practice, checking normality in the populations is done by looking at each of the samples using a histogram and checking whether there are any signs that the populations are not normal. Such signs could be extreme skewness and/or extreme outliers.

(ii) The populations are known or discovered not to be normal, but the sample size of each of the random samples is large enough (we can use the rule of thumb that a sample size greater than 30 is considered large enough).

Did I Get This? Conditions for Two Independent Samples

Assuming that we can safely use the two-sample t-test, we need to summarize the data, and in particular, calculate our data summary—the test statistic.

Test Statistic for Two-Sample T-test:

There are two choices for our test statistic, and we must choose the appropriate one to summarize our data We will see how to choose between the two test statistics in the next section. The two options are as follows:

We use the following notation to describe our samples: Here are the two cases for our test statistic.

(A) Equal Variances: If it is safe to assume that the two populations have equal standard deviations, we can pool our estimates of this common population standard deviation and use the following test statistic. where (B) Unequal Variances: If it is NOT safe to assume that the two populations have equal standard deviations, we have unequal standard deviations and must use the following test statistic. •  It is possible to never assume equal variances; however, if the assumption of equal variances is satisfied the equal variances t-test will have greater power to detect the difference of interest.
• We will not be calculating the values of these test statistics by hand in this course. We will instead rely on software to obtain the value for us.
• Both of these test statistics measure (in standard errors) how far our data are (represented by the difference of the sample means) from the null hypothesis (represented by the null value, 0).
• These test statistics have the same general form as others we have discussed. We will not discuss the derivation of the standard errors in each case but you should understand this general form and be able to identify each component for a specific test statistic. • Step 3: Find the p-value of the test by using the test statistic as follows

Each of these tests rely on a particular t-distribution under which the p-values are calculated. In the case where equal variances are assumed, the degrees of freedom are simply: whereas in the case of unequal variances, the formula for the degrees of freedom is more complex. We will rely on the software to obtain the degrees of freedom in both cases and provided us with the correct p-value (usually this will be a two-sided p-value).

• Step 4: Conclusion

As usual, we draw our conclusion based on the p-value. Be sure to write your conclusions in context by specifying your current variables and/or precisely describing the difference in population means in terms of the current variables.

If the p-value is small, there is a statistically significant difference between what was observed in the sample and what was claimed in Ho, so we reject Ho.

Conclusion: There is enough evidence that the categorical explanatory variable is related to (or associated with) the quantitative response variable. More specifically, there is enough evidence that the difference in population means is not equal to zero.

If the p-value is not small, we do not have enough statistical evidence to reject Ho.

Conclusion: There is NOT enough evidence that the categorical explanatory variable is related to (or associated with) the quantitative response variable. More specifically, there is enough evidence that the difference in population means is not equal to zero.

In particular, if a cutoff probability, α (significance level), is specified, we reject Ho if the p-value is less than α. Otherwise, we do not reject Ho.

LO 4.41: Based upon the output for a two-sample t-test, correctly interpret in context the appropriate confidence interval for the difference between population means

As in previous methods, we can follow-up with a confidence interval for the difference between population means, μ1 – μ2 and interpret this interval in the context of the problem.

Interpretation: We are 95% confident that the population mean for (one group) is between __________________ compared to the population mean for (the other group).

Confidence intervals can also be used to determine whether or not to reject the null hypothesis of the test based upon whether or not the null value of zero falls outside the interval or inside.

If the null value, 0, falls outside the confidence interval, Ho is rejected. (Zero is NOT a plausible value based upon the confidence interval)

If the null value, 0, falls inside the confidence interval, Ho is not rejected. (Zero IS a plausible value based upon the confidence interval)

NOTE: Be careful to choose the correct confidence interval about the difference between population means using the same assumption (variances equal or variances unequal) and not the individual confidence intervals for the means in the groups themselves.

## Test for Equality of Variances (or Standard Deviations)

LO 4.42: Based upon the output for a two-sample t-test, determine whether to use the results assuming equal variances or those assuming unequal variances.

Since we have two possible tests we can conduct, based upon whether or not we can assume the population standard deviations (or variances) are equal, we need a method to determine which test to use.

Although you can make a reasonable guess using information from the data (i.e. look at the distributions and estimates of the standard deviations and see if you feel they are reasonably equal), we have a test which can help us here, called the test for Equality of Variances. This output is automatically displayed in many software packages when a two-sample t-test is requested although the particular test used may vary.The hypotheses of this test are:

Ho: σ1 = σ2 (the standard deviations in the two populations are the same)

Ha: σ1 ≠ σ2 (the standard deviations in the two populations are not the same)

• If the p-value of this test for equal variances is small, there is enough evidence that the standard deviations in the two populations are different and we cannot assume equal variances.
• IMPORTANT! In this case, when we conduct the two-sample t-test to compare the population means, we use the test statistic for unequal variances.
• If the p-value of this test is large, there is not enough evidence that the standard deviations in the two populations are different. In this case we will assume equal variances since we have no clear evidence to the contrary.
• IMPORTANT! In this case, when we conduct the two-sample t-test to compare the population means, we use the test statistic for equal variances.

Now let’s look at a complete example of conducting a two-sample t-test, including the embedded test for equality of variances.

## EXAMPLE: What is more important — personality or looks?

This question was asked of a random sample of 239 college students, who were to answer on a scale of 1 to 25. An answer of 1 means personality has maximum importance and looks no importance at all, whereas an answer of 25 means looks have maximum importance and personality no importance at all. The purpose of this survey was to examine whether males and females differ with respect to the importance of looks vs. personality.

Note that the data have the following format:

 Score (Y) Gender (X) 15 Male 13 Female 10 Female 12 Male 14 Female 14 Male 6 Male 17 Male etc.

The format of the data reminds us that we are essentially examining the relationship between the two-valued categorical variable, gender, and the quantitative response, score. The two values of the categorical explanatory variable (k = 2) define the two populations that we are comparing — males and females. The comparison is with respect to the response variable score. Here is a figure that summarizes the example: • Note that this figure emphasizes how the fact that our explanatory is a two-valued categorical variable means that in practice we are comparing two populations (defined by these two values) with respect to our response Y.
• Note that even though the problem description just says that we had 239 students, the figure tells us that there were 85 males in the sample, and 150 females.
• Following up on comment 2, note that 85 + 150 = 235 and not 239. In these data (which are real) there are four “missing observations,”  4 students for which we do not have the value of the response variable, “importance.” This could be due to a number of reasons, such as recording error or non response. The bottom line is that even though data were collected from 239 students, effectively we have data from only 235. (Recommended: Go through the data file and note that there are 4 cases of missing observations: students 34, 138, 179, and 183).

Step 1: State the hypotheses

Recall that the purpose of this survey was to examine whether the opinions of females and males differ with respect to the importance of looks vs. personality. The hypotheses in this case are therefore:

Ho: μ1 – μ2 = 0 (which is the same as μ1 = μ2)

Ha: μ1 – μ2 ≠ 0 (which is the same as μ1 ≠ μ2)

where μ1 represents the mean “looks vs personality score” for females and μ2 represents the mean “looks vs personality score” for males.

It is important to understand that conceptually, the two hypotheses claim:

Ho: Score (of looks vs. personality) is not related to gender

Ha: Score (of looks vs. personality) is related to gender

Step 2: Obtain data, check conditions, and summarize data

• Data: Looks SPSS formatSAS formatExcel format, CSV format
• Let’s first check whether the conditions that allow us to safely use the two-sample t-test are met.
• Here, 239 students were chosen and were naturally divided into a sample of females and a sample of males. Since the students were chosen at random, the sample of females is independent of the sample of males.
• Here we are in the second scenario — the sample sizes (150 and 85), are definitely large enough, and so we can proceed regardless of whether the populations are normal or not.
• In the output below we first look at the test for equality of variances (outlined in orange). The two-sample t-test results we will use are outlined in blue.
• There are TWO TESTS represented in this output and we must make the correct decision for BOTH of these tests to correctly proceed.
• SOFTWARE OUTPUT In SPSS:
• The p-value for the test of equality of variances is reported as 0.849 in the SIG column under Levene’s test for equality of variances. (Note this differs from the p-value found using SAS, two different tests are used by default between the two programs).
• So we fail to reject the null hypothesis that the variances, or equivalently the standard deviations, are equal (Ho: σ1 = σ2).
• Conclusion to test for equality of variances: We cannot conclude there is a difference in the variance of looks vs. personality score between males and females.
• This results in using the row for Equal variances assumed to find the t-test results including the test statistic, p-value, and confidence interval for the difference.  (Outlined in BLUE) The output might also be broken up if you export or copy the items in certain ways. The results are the same but it can be more difficult to read. • SOFTWARE OUTPUT In SAS:
• The p-value for the test of equality of variances is reported as 0.5698 in the Pr > F column under equality of variances. (Note this differs from the p-value found using SPSS, two different tests are used by default between the two programs).
• So we fail to reject the null hypothesis that the variances, or equivalently the standard deviations, are equal (Ho: σ1 = σ2).
• Conclusion to test for equality of variances: We cannot conclude there is a difference in the variance of looks vs. personality score between males and females.
• This results in using the row for POOLED method where equal variances are assumed to find the t-test results including the test statistic, p-value, and confidence interval for the difference. (Outlined in BLUE) • TEST STATISTIC for Two-Sample T-test: In all of the results above, we determine that we will use the test which assumes the variances are EQUAL, and we find our test statistic of t = -4.58.

Step 3: Find the p-value of the test by using the test statistic as follows

• We will let the software find the p-value for us, and in this case, the p-value is less than our significance level of 0.05 in fact it is practically 0.
• This is found in SPSS in the equal variances assumed row under t-test in the SIG. (two-tailed) column given as 0.000 and in SAS in the POOLED ROW under Pr > |t| column given as <0.0001.
• A p-value which is practically 0 means that it would be almost impossible to get data like that observed (or even more extreme) had the null hypothesis been true.
• More specifically, in our example, if there were no differences between females and males with respect to whether they value looks vs. personality, it would be almost impossible (probability approximately 0) to get data where the difference between the sample means of females and males is -2.6 (that difference is 10.73 – 13.33 = -2.6) or more extreme.
• Comment: Note that the output tells us that the difference μ1 – μ2 is approximately -2.6. But more importantly, we want to know if this difference is statistically significant. To answer this, we use the fact that this difference is 4.58 standard errors below the null value.

Step 4: Conclusion

As usual a small p-value provides evidence against Ho. In our case our p-value is practically 0 (which is smaller than any level of significance that we will choose). The data therefore provide very strong evidence against Ho so we reject it.

• Conclusion: There is enough evidence that the mean Importance score (of looks vs personality) of males differs from that of females. In other words, males and females differ with respect to how they value looks vs. personality.

As a follow-up to this conclusion, we can construct a confidence interval for the difference between population means. In this case we will construct a confidence interval for μ1 – μ2 the population mean “looks vs personality score” for females minus the population mean “looks vs personality score” for males.

• Using statistical software, we find that the 95% confidence interval for μ1 – μ2 is roughly (-3.7, -1.5).
• This is found in SPSS in the equal variances assumed row under 95% confidence interval columns given as -3.712 to -1.480  and in SAS in the POOLED ROW under 95% CL MEAN column given as -3.7118 to -1.4804  (be careful NOT to choose the confidence interval  for the standard deviation in the last column, 9% CL Std Dev).
• Interpretation:
• We are 95% confident that the population mean “looks vs personality score” for females is between 3.7 and 1.5 points lower than that of males.
• OR
• We are 95% confident that the population mean “looks vs personality score” for males is between 3.7 and 1.5 points higher than that of females.
• The confidence interval therefore quantifies the effect that the explanatory variable (gender) has on the response (looks vs personality score).
• Since low values correspond to personality being more important and high values correspond to looks being more important, the result of our investigation suggests that, on average, females place personality higher than do males. Alternatively we could say that males place looks higher than do females.
• Note: The confidence interval does not contain zero (both values are negative based upon how we chose our groups) and thus using the confidence interval we can reject the null hypothesis here.

Practical Significance:

We should definitely ask ourselves if this is practically significant

• Is a true difference in population means as represented by our estimate from this data meaningful here? I will let you consider and answer for yourself.

SPSS Output for this example (Non-Parametric Output for Examples 1 and 2)

SAS Output and SAS Code (Includes Non-Parametric Test)

Here is another example.

## EXAMPLE: BMI vs. Gender in Heart Attack Patients

A study was conducted which enrolled and followed heart attack patients in a certain metropolitan area. In this example we are interested in determining if there is a relationship between Body Mass Index (BMI) and gender. Individuals presenting to the hospital with a heart attack were randomly selected to participate in the study.

Step 1: State the hypotheses

Ho: μ1 – μ2 = 0 (which is the same as μ1 = μ2)

Ha: μ1 – μ2 ≠ 0 (which is the same as μ1 ≠ μ2)

where μ1 represents the mean BMI for males and μ2 represents the mean BMI for females.

It is important to understand that conceptually, the two hypotheses claim:

Ho: BMI is not related to gender in heart attack patients

Ha: BMI is related to gender in heart attack patients

Step 2: Obtain data, check conditions, and summarize data

• Data: WHAS500 SPSS formatSAS format
• Let’s first check whether the conditions that allow us to safely use the two-sample t-test are met.
• Here, subjects were chosen and were naturally divided into a sample of females and a sample of males. Since the subjects were chosen at random, the sample of females is independent of the sample of males.
• Here, we are in the second scenario — the sample sizes are extremely large, and so we can proceed regardless of whether the populations are normal or not.
• In the output below we first look at the test for equality of variances (outlined in orange). The two-sample t-test results we will use are outlined in blue.
• There are TWO TESTS represented in this output and we must make the correct decision for BOTH of these tests to correctly proceed.
• SOFTWARE OUTPUT In SPSS:
• The p-value for the test of equality of variances is reported as 0.001 in the SIG column under Levene’s test for equality of variances.
• So we reject the null hypothesis that the variances, or equivalently the standard deviations, are equal (Ho: σ1 = σ2).
• Conclusion to test for equality of variances: We conclude there is enought evidence of a difference in the variance of looks vs. personality score between males and females.
• This results in using the row for Equal variances NOT assumed to find the t-test results including the test statistic, p-value, and confidence interval for the difference.  (Outlined in BLUE) • SOFTWARE OUTPUT In SAS:
• The p-value for the test of equality of variances is reported as 0.0004 in the Pr > F column under equality of variances.
• So we reject the null hypothesis that the variances, or equivalently the standard deviations, are equal (Ho: σ1 = σ2).
• Conclusion to test for equality of variances: We conclude there is enough evidence of a difference in the variance of looks vs. personality score between males and females.
• This results in using the row for SATTERTHWAITE method where UNEQUAL variances are assumed to find the t-test results including the test statistic, p-value, and confidence interval for the difference. (Outlined in BLUE) • TEST STATISTIC for Two-Sample T-test: In all of the results above, we determine that we will use the test which assumes the variances are UNEQUAL, and we find our test statistic of t = 3.21.

Step 3: Find the p-value of the test by using the test statistic as follows

• We will let the software find the p-value for us, and in this case, the p-value is less than our significance level of 0.05.
• This is found in SPSS in the UNEQUAL variances assumed row under t-test in the SIG. (two-tailed) column given as 0.001 and in SAS in the SATTERTHWAITE ROW under Pr > |t| column given as 0.0015.
• This p-value means that it would be extremely rare to get data like that observed (or even more extreme) had the null hypothesis been true.
• More specifically, in our example, if there were no differences between females and males with respect to BMI, it would be almost highly unlikely (0.001 probability) to get data where the difference between the sample mean BMIs of males and females is 1.64 or more extreme.
• Comment: Note that the output tells us that the difference μ1 – μ2 is approximately 1.64. But more importantly, we want to know if this difference is statistically significant. To answer this, we use the fact that this difference is 3.21 standard errors above the null value.

Step 4: Conclusion

As usual a small p-value provides evidence against Ho. In our case our p-value is 0.001 (which is smaller than any level of significance that we will choose). The data therefore provide very strong evidence against Ho so we reject it.

• Conclusion:  The mean BMI of males differs from that of females. In other words, males and females differ with respect to BMI among heart attack patients.

As a follow-up to this conclusion, we can construct a confidence interval for the difference between population means. In this case we will construct a confidence interval for μ1 – μ2 the population mean BMI for males minus the population mean BMI for females.

• Using statistical software, we find that the 95% confidence interval for μ1 – μ2 is roughly (0.63, 2.64).
• This is found in SPSS in the UNEQUAL variances assumed row under 95% confidence interval columns and in SAS in the SATTERTHWAITE ROW under 95% CL MEAN column.
• Interpretation:
• With 95% confidence that the population mean BMI for males is between 0.63 and 2.64 units larger than that of females.
• OR
• With 95% confidence that the population mean BMI for females is between 0.63 and 2.64 units smaller than that of males.
• The confidence interval therefore quantifies the effect of the explanatory variable (gender) on the response (BMI). Notice that we cannot imply a causal effect of gender on BMI based upon this result alone as there could be many lurking variables, unaccounted for in this analysis, which might be partially or even completely responsible for this difference.
• Note: The confidence interval does not contain zero (both values are positive based upon how we chose our groups) and thus using the confidence interval we can reject the null hypothesis here.

Practical Significance:

• We should definitely ask ourselves if this is practically significant
• Is a true difference in population means as represented by our estimate from this data meaningful here? Is a difference in BMI of between 0.53 and 2.64 of interest?
• I will let you consider and answer for yourself.

SPSS Output for this example (Non-Parametric Output for Examples 1 and 2)

SAS Output and SAS Code (Includes Non-Parametric Test)
Note: In the SAS output the variable gender is not formatted, in this case Males = 0 and Females = 1.

You might ask yourself: “Where do we use the test statistic?”

It is true that for all practical purposes all we have to do is check that the conditions which allow us to use the two-sample t-test are met, lift the p-value from the output, and draw our conclusions accordingly.

However, we feel that it is important to mention the test statistic for two reasons:

• The test statistic is what’s behind the scenes; based on its null distribution and its value, the p-value is calculated.
• Apart from being the key for calculating the p-value, the test statistic is also itself a measure of the evidence stored in the data against Ho. As we mentioned, it measures (in standard errors) how different our data is from what is claimed in the null hypothesis.

Now try some more activities for yourself.

Did I Get This? Two-Sample T-test and Related Confidence Interval