This short video contains an overview of calculating conditional percentages.

The original slides are not available.

This document is linked from Case C-C.

]]>This document is linked from Case C-Q.

]]>This short video elaborates upon the information displayed in a boxplot.

The original slides are not available.

This document is linked from Boxplots.

]]>Optional: Create your own solutions using your software for extra practice.

In this activity, we will use the collected data to:

- build a two-way table and compute conditional percentages.
- interpret the data in terms of the relationship between a young child’s nighttime exposure to light and later nearsightedness.

An Associated Press article captured the attention of readers with the headline “Night lights bad for kids?” The article was based on a 1999 study at the University of Pennsylvania and Children’s Hospital of Philadelphia, in which parents were surveyed about the lighting conditions under which their children slept between birth and age 2 (lamp, night-light, or no light) and whether or not their children developed nearsightedness (myopia). The purpose of the study was to explore the effect of a young child’s nighttime exposure to light on later nearsightedness.

nightlight.xls or nightlight.csv

**Create Two-Way tables:**ANALYZE > DESCRIPTIVE STATISTICS > CROSSTABS, complete the wizard (4 times) to obtain- Two-way table with the count (frequency) and percent (out of total)
- Two-way table with the count (frequency) and row percents
- Two-way table with the count (frequency) and column percents
- Two-way table with the count (frequency), row and column percents

**Create Two-Way tables:**Use PROC FREQ and the tables statement to create a two-way table with light and nearsightedness

This document is linked from Case C-C.

The following tables are used in the next question.

The United States federal government collects information on Americans who do not have health insurance. Data from 2004 are broken down into 4 regions of the country. These data are summarized in the table provided below. Using this table, answer the following questions.

Answer these questions:

This document is linked from Case C-C.

]]>This document is linked from Case C-C.

]]>This document is linked from Case C-C.

]]>- to practice comparing and contrasting distributions, and
- to help you gain more intuition about variability through the interpretation of your results in context.

The percentage of each entering freshman class that graduated on time was recorded for each of six colleges at a major university over a period of several years. (Source: This data is distributed with the software package, Data Desk. (1993). Ithaca, NY: Data Description, Inc., and appeared in the Data and Story Library)

In order to compare the graduation rates among the different colleges, we will create side-by-side boxplots (graduation rate by college), and supplement the graph with numerical measures. Answer the questions based on the SPSS output provided.

This document is linked from Boxplots.

]]>**BACKGROUND INFORMATION**

A study was conducted in order to find out whether pamphlets containing information for cancer patients are written at a level that the cancer patients can understand.

Tests were administered to measure the reading levels of 63 cancer patients, and the readability levels of 30 cancer pamphlets were evaluated based on such factors as the lengths of the sentences and the number of polysyllabic words.

Both the reading and readability levels correspond to grade levels, but patients’ reading levels of less than grade 3 and above grade 12 cannot be determined exactly. (Source: Short, Moriarty, and Cooly. (1995). “Readability of Educational Materials for Cancer Patients.” Journal of Statistics Education, v.3, n.2)

The following tables indicate the number of patients at each reading level and the number of pamphlets at each readability level.

**Comment:**

- Note that the data are presented in a grouped form; the actual readability data, for example, are: 6 6 6 7 7 7 8 8 8 8 8 8 8 8 9 9 9 9, etc.

Answer the following questions:

This document is linked from Measures of Center.

]]>**Related SAS Tutorials**

- 6A – (3:07) Two-Way (Contingency) Tables – EDA

**Related SPSS Tutorials**

- 6A – (7:57) Two-Way (Contingency) Tables – EDA

Recall the role-type classification table for framing our discussion about the relationship between two variables:

We are done with case C→Q, and will now move on to case C→C, where we examine the relationship between two categorical variables.

Earlier in the course, (when we discussed the distribution of a **single** categorical variable) we examined the data obtained when a random sample of 1,200 U.S. college students were asked about their body image (underweight, overweight, or about right). We are now returning to this example, to address the following question:

If we had separated our sample of 1,200 U.S. college students by gender and looked at **males and females separately**, would we have found a similar distribution across body-image categories? More specifically, are men and women just as likely to think their weight is about right? Among those students who do not think their weight is about right, is there a difference between the genders in feelings about body image?

Answering these questions requires us to **examine the relationship between two categorical variables**, gender and body image. Because the question of interest is whether there is a gender effect on body image,

- the
**explanatory**variable is**gender**, and - the
**response**variable is**body image**.

Here is what the raw data look like when we include the gender of each student:

Once again the raw data is a long list of 1,200 genders and responses, and thus not very useful in that form.

To start our exploration of how body image is related to gender, we need an informative display that summarizes the data. In order to summarize the relationship between two categorical variables, we create a display called a **two-way table** or **contingency table**.

Here is the two-way table for our example:

The table has the possible genders in the rows, and the possible responses regarding body image in the columns. At each intersection between row and column, we put the counts for how many times that combination of gender and body image occurred in the data. We sum across the rows to fill in the Total column, and we sum across the columns to fill in the Total row.

Complete the following activities related to this data.

**Comments:**

Note that from the way the two-way table is constructed, the Total row or column is a summary of one of the two categorical variables, ignoring the other. In our example:

- The Total row gives the summary of the categorical variable body image:

- The Total column gives the summary of the categorical variable gender:(These are the same counts we found earlier in the course when we looked at the single categorical variable body image, and did not consider gender.)

So far we have organized the raw data in a much more informative display — the two-way table:

Remember, though, that our primary goal is to explore how body image is related to gender. Exploring the relationship between two categorical variables (in this case body image and gender) amounts to comparing the distributions of the response variable (in this case body image) across the different values of the explanatory variable (in this case males and females):

Note that it doesn’t make sense to compare raw counts, because there are more females than males overall. So for example, it is not very informative to say “there are 560 females who responded ‘about right’ compared to only 295 males,” since the 560 females are out of a total of 760, and the 295 males are out of a total of only 440.

We need to supplement our display, the two-way table, with some numerical measures that will allow us to compare the distributions. These numerical measures are found by simply **converting the counts to percents within (or restricted to) each value of the explanatory variable separately. **

In our example: We look at each gender separately, and convert the counts to percents **within that gender.** Let’s start with females:

Note that each count is converted to percents by dividing by the total number of females, 760. These numerical measures are called **conditional percents**, since we find them by “conditioning” on one of the genders.

Now complete the following activities to calculate the row percentages for males.

**Comments:**

- In our example, we chose to organize the data with the explanatory variable gender in rows and the response variable body image in columns, and thus our conditional percents were
**row percents**, calculated within each row separately. Similarly, if the explanatory variable happens to sit in columns and the response variable in rows, our conditional percents will be**column percents**, calculated within each column separately. For an example, see the “Did I Get This?” exercises below.

- Another way to visualize the conditional percents, instead of a table, is the
**double bar chart.**This display is quite common in newspapers.

Now that we have summarized the relationship between the categorical variables gender and body image, let’s go back and interpret the results in the context of the questions that we posed.

For additional practice complete the following activities.

- The relationship between two categorical variables is summarized using:
**Data display:**two-way table, supplemented by**Numerical measures:**conditional percentages.

- Conditional percentages are calculated for each value of the explanatory variable separately. They can be row percents, if the explanatory variable “sits” in the rows, or column percents, if the explanatory variable “sits” in the columns.
- When we try to understand the relationship between two categorical variables, we compare the distributions of the response variable for values of the explanatory variable. In particular, we look at how the pattern of conditional percentages differs between the values of the explanatory variable.