Learn By Doing: Random Assignment to Treatment (Software)

Published: December 29th, 2012

Category: Activity 1: Learn By Doing, Learning Statistical Software

Use the solutions provided to answer the questions below.

Optional: Create your own solutions using your software for extra practice.


The purpose of this activity is to explore the effectiveness of randomization in creating similar treatment groups, in the sense that it balances the groups with respect to other variables that we didn’t control for.


Use the following output to answer the questions that follow.

Background Information for Dataset

A local internet service provider (ISP) created two new versions of its software, with alternative ways of implementing a new feature. To find the product that would lead to the highest satisfaction among customers, the ISP conducted an experiment comparing users’ preferences for the two new versions versus the existing software.

The ISP ideally wants to find out which of the three software products causes the highest user satisfaction. It has identified three major potential lurking variables that might affect user satisfaction — gender, age, and hours per week of computer use.

In this activity, we will use adults in a hypothetical city as the population of interest to the ISP. We will:

  • create a simple random sample as the basis for the experimental study of the population,
  • use randomization to assign individuals to treatment groups, and
  • verify that randomization prevented the three treatment groups from being different with respect to the most obvious lurking variables.

The dataset includes the following variables:

  • age: in years
  • gender: female or male
  • comp: hours per week of computer use

The company must rely upon sampling to study its customers’ preferences, since the entire population cannot be assigned to treatments. Therefore, we will first choose a simple random sample (SRS) of 450 people for the subjects in the study.

Then we will randomly assign our SRS of 450 subjects to treatment groups, one for each of the three versions of the ISP’s software. Let’s denote the versions “1,” “2,” and “3,” and create a categorical variable to identify the treatment for each subject.

Finally we will examine whether the randomization was successful in making our three treatment groups similar with respect to the variables age, gender, and comp. In other words, we will examine whether the distributions of these variables in the three groups are similar or not.

  • To compare the distribution of age and comp (the hours per week of computer use) among the three treatment groups, we’ll create side-by-side boxplots by treatment.
  • To compare the distribution of gender among the three treatment groups, we’ll look at a two-way table of conditional percents:


Learn By Doing

Answer the following questions using the output.



Suppose that instead of randomly assigning treatments, the software users could choose one of the three themselves (thus making this an observational study instead of an experiment).


(Optional) SPSS Steps:

  • Create Random Sample: DATA > SELECT CASES > Random Sample, choose “Exactly 450 cases from the first 20783 cases”. Choose copy selected cases to new data set and enter a name (SPSS does not actually use this name but you much choose one)
  • Create New Variable: TRANSFORM > COMPUTE VARIABLE, name the new variable and type RND(RV.UNIF(0.5, 3.49)) into the numerical calculation box
  • Edit Data: DATA > DEFINE VARIABLE PROPERTIES, set up the new variable as a nominal variable and round to 0 decimal places
  • Save the new data: FILE > SAVE AS, choose location and file name and continue
  • Side-by-Side Boxplots for Age by Treatment: GRAPHS > CHART BUILDER
  • Side-by-Side Boxplots for Comp by Treatment: GRAPHS > CHART BUILDER
  • Two-Way Table for Gender by Treatment: ANALYZE > DESCRIPTIVE STATISTICS > CROSSTABS

(Optional) SAS Steps:

  • Open SAS and Create Random Sample: Use PROC SURVEYSELECT to create a simple random sample of 450 observations from the current population. Name the output dataset computer_srs.
  • Create New Variable: Use a DATA step to create a new variable called TRT using the code: TRT = floor(2.99*ranuni(0)+1); Although you do not need to understand this code, it creates a random uniform variable between 1 and 3.99 and then truncates this to an integer so that 1.2 would become 1, etc.
  • Side-by-Side Boxplots for Age by Treatment: Using PROC SGPLOT
  • Side-by-Side Boxplots for Comp by Treatment: Using PROC SGPLOT
  • Two-Way Table for Gender by Treatment: Using PROC FREQ

This document is linked from Causation and Experiments.