This short video contains an a brief discussion of how the mean and median compare. There is some additional information about how you should judge the difference between these two measures of center.

The original slides are not available.

Transcript – Live Measures of Center

This document is linked from Measures of Center.

]]>Here is an interactive demonstration which lets you build the histogram and compare the mean and median for various shapes.

This material is from Interactivate.

This document is linked from Measures of Center.

]]>In the workplace, depression is a leading cause of absenteeism and loss of productivity (Greenberg, et al. 1993). To assess the degree to which people suffer from depression, prior to receiving treatment, data were collected on the number of days that 105 patients were depressed prior to starting a new treatment. These data are displayed in the following table and histogram:

Days |
Count |

[20-60] | 5 |

[60-100] | 10 |

[100-140] | 20 |

[140-180] | 30 |

[180-220] | 16 |

[220-260] | 10 |

[260-300] | 6 |

[300-340] | 4 |

[340-380] | 2 |

[380-420] | 0 |

[420-460] | 0 |

[460-500] | 0 |

[500-540] | 2 |

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/12/qzLBD_New_Center1.swf

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/12/qzLBD_New_Center2.swf

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/12/qzLBD_New_Center3.swf

This document is linked from Measures of Center.

]]>http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-LBD01010.swf

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-LBD01011.swf

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-LBD01012.swf

This document is linked from Measures of Center.

]]>

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-DIG01012.swf

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-DIG01013.swf

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-DIG01014.swf

This document is linked from Measures of Center.

]]>**BACKGROUND INFORMATION**

A study was conducted in order to find out whether pamphlets containing information for cancer patients are written at a level that the cancer patients can understand.

Tests were administered to measure the reading levels of 63 cancer patients, and the readability levels of 30 cancer pamphlets were evaluated based on such factors as the lengths of the sentences and the number of polysyllabic words.

Both the reading and readability levels correspond to grade levels, but patients’ reading levels of less than grade 3 and above grade 12 cannot be determined exactly. (Source: Short, Moriarty, and Cooly. (1995). “Readability of Educational Materials for Cancer Patients.” Journal of Statistics Education, v.3, n.2)

The following tables indicate the number of patients at each reading level and the number of pamphlets at each readability level.

**Comment:**

- Note that the data are presented in a grouped form; the actual readability data, for example, are: 6 6 6 7 7 7 8 8 8 8 8 8 8 8 9 9 9 9, etc.

Answer the following questions:

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-LBD01006.swf

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-LBD01007.swf

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-LBD01008.swf

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-LBD01009.swf

This document is linked from Measures of Center.

]]>

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-DIG01011.swf

This document is linked from Measures of Center.

]]>**Related SAS Tutorials**

- 5A – (3:01) Numeric Measures using PROC MEANS

**Related SPSS Tutorials**

- 5A – (8:00) Numeric Measures using EXPLORE

Intuitively speaking, a numerical measure of center describes a “typical value” of the distribution.

The two main numerical measures for the center of a distribution are the **mean** and the **median**.

In this unit on Exploratory Data Analysis, we will be calculating these results based upon a **sample** and so we will often emphasize that the values calculated are the **sample mean** and **sample median**.

Each one of these measures is based on a completely different idea of describing the center of a distribution.

We will first present each one of the measures, and then compare their properties.

The **mean** is the **average** of a set of observations (i.e., the sum of the observations divided by the number of observations).

The **mean** is the **average** of a set of observations

- The sum of the observations divided by the number of observations).
- If the n observations are written as

- their mean can be written mathematically as:their mean is:

We read the symbol as “x-bar.” The bar notation is commonly used to represent the **sample mean**, i.e. the mean of the sample.

Using any appropriate letter to represent the variable (x, y, etc.), we can indicate the sample mean of this variable by adding a bar over the variable notation.

We will continue with the Best Actress Oscar winners example (Link to the Best Actress Oscar Winners data).

34 34 26 37 42 41 35 31 41 33 30 74 33 49 38 61 21 41 26 80 43 29 33 35 45 49 39 34 26 25 35 33

The mean age of the 32 actresses is:

We add all of the ages to get **1233** and **divide** **by** the number of ages which was **32** to get **38.5. **

We denote this result as **x-bar** and called the **sample mean**.

Note that the sample mean gives a measure of center which is higher than our approximation of the center from looking at the histogram (which was 35). The reason for this will be clear soon.

Often we have large sets of data and use a frequency table to display the data more efficiently.

Data were collected from the last three World Cup soccer tournaments. A total of 192 games were played. The table below lists the number of goals scored per game (not including any goals scored in shootouts).

Total # Goals/Game | Frequency |
---|---|

0 | 17 |

1 | 45 |

2 | 51 |

3 | 37 |

4 | 25 |

5 | 11 |

6 | 3 |

7 | 2 |

8 | 1 |

**To find the mean** number of goals scored per game, we would need to **find the sum of all 192 numbers, and then divide that sum by 192.**

Rather than add 192 numbers, we use the fact that the same numbers appear many times. For example, the number 0 appears 17 times, the number 1 appears 45 times, the number 2 appears 51 times, etc.

If we add up 17 zeros, we get 0. If we add up 45 ones, we get 45. If we add up 51 twos, we get 102. Repeated addition is multiplication.

Thus, the **sum of the 192 numbers**

** = 0(17) + 1(45) + 2(51) + 3(37) + 4(25) + 5(11) + 6(3) + 7(2) + 8(1) = 453.**

The **sample mean** is then **453 / 192 = 2.359**.

Note that, in this example, the values of 1, 2, and 3 are the most common and our average falls in this range representing the bulk of the data.

The **median** M is the midpoint of the distribution. It is the number such that half of the observations fall above, and half fall below.

To find the median:

- Order the data from smallest to largest.

- Consider whether n, the number of observations, is even or odd.
- If n is
**odd**, the median M is the center observation in the ordered list. This observation is the one “sitting” in the**(n + 1) / 2**spot in the ordered list. - If n is
**even**, the median M is the**mean**of the**two center observations**in the ordered list. These two observations are the ones “sitting” in the**(n / 2)**and**(n / 2) + 1**spots in the ordered list.

- If n is

For a simple visualization of the location of the median, consider the following two simple cases of n = 7 and n = 8 ordered observations, with each observation represented by a solid circle:

**Comments:**

- In the images above, the dots are equally spaced, this need not indicate the data values are actually equally spaced as we are only interested in listing them in order.

- In fact, in the above pictures, two subsequent dots could have exactly the same value.

- It is clear that the value of the median will be in the same position regardless of the distance between data values.

To find the median age of the Best Actress Oscar winners, we first need to order the data.

It would be useful, then, to use the stemplot, a diagram in which the data are already ordered.

- Here n = 32 (an even number), so the median M, will be the mean of the two center observations
- These are located at the (n / 2) = 32 / 2 =
**16th**and (n / 2) + 1 = (32 / 2) + 1 =**17th**

Counting from the top, we find that:

- the 16th ranked observation is 35
- the 17th ranked observation also happens to be 35

Therefore, the median M = (35 + 35) / 2 = 35

As we have seen, the **mean** and the **median**, the most common **measures of center**, each describe the center of a distribution of values in a different way.

- The mean describes the center as an average value, in which the
**actual values**of the data points play an important role. - The median, on the other hand, locates the middle value as the center, and the
**order**of the data is the key.

To get a deeper understanding of the differences between these two measures of center, consider the following example. Here are two datasets:

Data set A → 64 65 66 68 70 71 73 |

Data set B → 64 65 66 68 70 71 730 |

For dataset A, the mean is 68.1, and the median is 68.

Looking at dataset B, notice that all of the observations except the last one are close together. The observation 730 is very large, and is certainly an outlier.

In this case, the median is still 68, but the mean will be influenced by the high outlier, and shifted up to 162.

The message that we should take from this example is:

Therefore:

- For symmetric distributions with no outliers: the mean is approximately equal to the median.

- For skewed right distributions and/or datasets with high outliers: the mean is greater than the median.

- For skewed left distributions and/or datasets with low outliers: the mean is less than the median.

**Conclusions… When to use which measures?**

- Use the sample mean as a measure of center for symmetric distributions with no outliers.
- Otherwise, the median will be a more appropriate measure of the center of our data.

- The two main numerical measures for the center of a distribution are the mean and the median. The mean is the average value, while the median is the middle value.

- The mean is very sensitive to outliers (as it factors in their magnitude), while the median is resistant to outliers.

- The mean is an appropriate measure of center for symmetric distributions with no outliers. In all other cases, the median is often a better measure of the center of the distribution.