Learn By Doing – Exploring a Dataset (Depression Data)

Published: August 2nd, 2012

Category: Activity 1: Learn By Doing, Learning Statistical Software

Datasets will often be provided as .xls files (EXCEL) and .csv  files (Comma Separated Values).


For this first activity with data you will need EXCEL (or Open Office) to view the .xls file. You can view the .csv file in EXCEL or any text editor.

Very soon you will need SAS or SPSS (depending upon which course you are taking) and you will learn to import data from .xls and/or .csv files.

It is not necessary to have EXCEL to import .xls files, however, if you wish to view the original file you will need EXCEL or another program which can open or view these files.

In this course, when you are given data in .xls/.csv format, it is extremely important to check the raw dataset against your imported data as different versions of programs and different computer settings can cause issues with the data import process.
It is your responsibility to verify that your imported dataset is accurate.
The original dataset and your imported dataset should be identical in all respects.

Background Information for Dataset

Clinical depression is the most common mental illness in the United States, affecting 19 million adults each year (Source: NIMH, 1999). Nearly 50% of individuals who experience a major episode will have a recurrence within 2-3 years. Researchers are interested in comparing therapeutic solutions that could delay or reduce the incidence of recurrence.

In a study conducted by the National Institutes of Health, 109 clinically depressed patients were separated into three groups, and each group was given one of two active drugs (imipramine or lithium) or no drug at all. For each patient, the dataset contains the treatment used, the outcome of the treatment, and several other interesting characteristics.

Here is a summary of the variables in our dataset:

  • Hospt: The patient’s hospital, represented by a code for each of the 5 hospitals (1, 2, 3, 5, or 6)
  • Treat: The treatment received by the patient (Lithium, Imipramine, or Placebo)
  • Outcome: Whether or not a recurrence occurred during the patient’s treatment (Recurrence or No Recurrence)
  • Time: Either the time (days) till recurrence, or if no recurrence, the length (days) of the patient’s participation in the study.
  • AcuteT: The time (days) that the patient was depressed prior to the study.
  • Age: The age of the patient in years, when the patient entered the study.
  • Gender: The patient’s gender (1 = Female, 2 = Male)

Note: In this dataset some of the categorical variables use numeric codes and others use a text description. Often, if numeric codes are to be used, then ALL of the categorical variables will be coded. When we work in software, we will learn how to have the program translate these codes for us in our analyses.

Open the Dataset

To open the data, right-click on the file name, depression.xls, and choose “Save Link As” (or “Save Target As”) to download the file to your computer. Then find the downloaded file and double-click it to open it in Excel (or Open Office, etc.).

This dataset is also available as a comma separated file (CSV), depression.csv which can be opened in any text editor, although the data are not as visually organized in this type of file.

In future assignments you will need to download datasets in this manner in order to import them, i.e. you will need to have the file saved to your computer.

In Excel, the dataset is in tabular form. Each row contains the values of the variables associated with a single individual, and the different variables are separated into columns. It is helpful if the columns are labeled with the variable names, as we have in this case.

Classify Variable Types

Which variables are categorical and which are quantitative?

Learn By Doing: Types of Variables


This document is linked from Types of Variables.