Variables can be broadly classified into one of two **types**:

- Quantitative

- Categorical

Below we define these two main types of variables and provide further sub-classifications for each type.

**Categorical variables** take **category** or **label** values, and place an individual into one of several **groups**.

Categorical variables are often further classified as either:

**Nominal,**when there**is no natural ordering among the categories**.

Common examples would be gender, eye color, or ethnicity.

**Ordinal**, when there**is a natural order among the categories**, such as, ranking scales or letter grades.

However, ordinal variables are still categorical and do not provide precise measurements.

Differences are not precisely meaningful, for example, if one student scores an A and another a B on an assignment, we cannot say precisely the difference in their scores, only that an A is larger than a B.

**Quantitative variables** take **numerical** values, and represent some kind of **measurement**.

Quantitative variables are often further classified as either:

**Discrete**, when the variable takes on a**countable**number of values.

Most often these variables indeed represent some kind of **count** such as the number of prescriptions an individual takes daily.

**Continuous**, when the variable**can take on any value in some range of values**.

Our precision in measuring these variables is often limited by our instruments.

Units should be provided.

Common examples would be height (inches), weight (pounds), or time to recovery (days).

One special variable type occurs when a variable has only two possible values.

A variable is said to be** Binary **or **Dichotomous**, when there are only two possible levels.

These variables can usually be phrased in a “yes/no” question. Gender is an example of a binary variable.

Currently we are primarily concerned with classifying variables as either categorical or quantitative.

Sometimes, however, we will need to consider further and sub-classify these variables as defined above.

These concepts will be discussed and reviewed as needed but here is a quick practice on sub-classifying categorical and quantitative variables.

Let’s revisit the dataset showing medical records for a sample of patients

In our example of medical records, there are several variables of each type:

- Age, Weight, and Height are
**quantitative**variables.

- Race, Gender, and Smoking are
**categorical**variables.

** Comments:**

- Notice that the values of the
**categorical**variable Smoking have been**coded**as the numbers 0 or 1.

It is quite common to code the values of a categorical variable as numbers, but you should remember that these are just codes.

They have no arithmetic meaning (i.e., it does not make sense to add, subtract, multiply, divide, or compare the magnitude of such values).

Usually, if such a coding is used, all categorical variables will be coded and we will tend to do this type of coding for datasets in this course.

- Sometimes,
**quantitative**variables are**divided into groups**for analysis, in such a situation, although the original variable was quantitative, the variable analyzed is categorical.

A common example is to provide information about an individual’s Body Mass Index by stating whether the individual is underweight, normal, overweight, or obese.

This categorized BMI is an example of an ordinal categorical variable.

**Categorical**variables are sometimes called qualitative variables, but in this course we’ll use the term “categorical.”

The **types of variables** you are analyzing **directly relate to the available** descriptive and inferential **statistical methods**.

It is important to:

**assess how you will measure the effect of interest**and**know how this determines the statistical methods you can use.**

As we proceed in this course, we will continually emphasize the **types of variables** that are** appropriate for each method we discuss**.

For example:

To compare the number of polio cases in the two treatment arms of the Salk Polio vaccine trial, you could use

- Fisher’s Exact Test
- Chi-Square Test

To compare blood pressures in a clinical trial evaluating two blood pressure-lowering medications, you could use

- Two-sample t-Test
- Wilcoxon Rank-Sum Test

For this first activity with data you will need EXCEL (or Open Office) to view the .xls file. You can view the .csv file in EXCEL or any text editor.

Very soon you will need SAS or SPSS (depending upon which course you are taking) and you will learn to import data from .xls and/or .csv files.

It is not necessary to have EXCEL to import .xls files, however, if you wish to view the original file you will need EXCEL or another program which can open or view these files.

In this course, when you are given data in .xls/.csv format, it is extremely important to check the raw dataset against your imported data as different versions of programs and different computer settings can cause issues with the data import process.

Clinical depression is the most common mental illness in the United States, affecting 19 million adults each year (Source: NIMH, 1999). Nearly 50% of individuals who experience a major episode will have a recurrence within 2-3 years. Researchers are interested in comparing therapeutic solutions that could delay or reduce the incidence of recurrence.

In a study conducted by the National Institutes of Health, 109 clinically depressed patients were separated into three groups, and each group was given one of two active drugs (imipramine or lithium) or no drug at all. For each patient, the dataset contains the treatment used, the outcome of the treatment, and several other interesting characteristics.

Here is a summary of the variables in our dataset:

**Hospt:**The patient’s hospital, represented by a code for each of the 5 hospitals (1, 2, 3, 5, or 6)

**Treat:**The treatment received by the patient (Lithium, Imipramine, or Placebo)

**Outcome:**Whether or not a recurrence occurred during the patient’s treatment (Recurrence or No Recurrence)

**Time:**Either the time (days) till recurrence, or if no recurrence, the length (days) of the patient’s participation in the study.

**AcuteT:**The time (days) that the patient was depressed prior to the study.

**Age:**The age of the patient in years, when the patient entered the study.

**Gender:**The patient’s gender (1 = Female, 2 = Male)

**Note:** In this dataset some of the categorical variables use numeric codes and others use a text description. Often, if numeric codes are to be used, then ALL of the categorical variables will be coded. When we work in software, we will learn how to have the program translate these codes for us in our analyses.

To open the data, right-click on the file name, depression.xls, and choose “Save Link As” (or “Save Target As”) to download the file to your computer. Then find the downloaded file and double-click it to open it in Excel (or Open Office, etc.).

This dataset is also available as a comma separated file (CSV), depression.csv which can be opened in any text editor, although the data are not as visually organized in this type of file.

In future assignments you will need to download datasets in this manner in order to import them, i.e. you will need to have the file saved to your computer.

In Excel, the dataset is in tabular form. Each row contains the values of the variables associated with a single individual, and the different variables are separated into columns. It is helpful if the columns are labeled with the variable names, as we have in this case.

Which variables are categorical and which are quantitative?

This document is linked from Types of Variables.

]]>