More About Experiments

CO-3: Describe the strengths and limitations of designed experiments and observational studies.
LO 3.2: Explain how the study design impacts the types of conclusions that can be drawn.
LO 3.3: Identify and define key features of experimental design (randomized, blind etc.).
Video: More About Experiments (4:09)

Experiments With More Than One Explanatory Variable

It is not uncommon for experiments to feature two or more explanatory variables (called factors). In this course, we focus on exploratory data analysis and statistical inference in situations which involve only one explanatory variable. Nevertheless, we will now consider the design for experiments involving several explanatory variables, in order to familiarize students with their basic structure.


Suppose researchers are not only interested in the effect of diet on blood pressure, but also the effect of two new drugs. Subjects are assigned to either Control Diet (no restrictions), Diet #1, or Diet #2, (the variable diet has, then, 3 possible values) and are also assigned to receive either Placebo, Drug #1, or Drug #2 (the variable Drug, then, also has three values). This is an example where the experiment has two explanatory variables and a response variable. In order to set up such an experiment, there has to be one treatment group for every combination of categories of the two explanatory variables. Thus, in this case there are 3 * 3 = 9 combinations of the two variables to which the subjects are assigned. The treatment groups are illustrated and labeled in the following table:

The column headings for the table are for the Diet variable: "No-diet", "Special diet 1" and "Special diet 2." The Rows are for the drug variable: "Placebo," "Drug 1," and "Drug 2." There are 9 cells in the table, one for every possible combination of row and column. These cells are labeled "tttX", where X is in the range of [1-9], representing each combination.

Subjects would be randomly assigned to one of the nine treatment groups. If we find differences in the proportions of subjects who achieve the lower “moderate zone” blood pressure among the nine treatment groups, then we have evidence that the diets and/or drugs may be effective for reducing blood pressure.

From the population we generate a sample. The individuals of the sample are represented as a whole visually with a circle. These individuals are then divided by randomly assigning them to one of the 9 treatment groups. These treatment groups are "ttt1: no-diet and placebo,", "ttt2: diet 1 and placebo", "ttt3: diet 2 and placebo", and so on, up to "ttt9: diet 2 and drug 2." The responses from each of these treatment groups are compared.


  • Recall that randomization may be employed at two stages of an experiment: in the selection of subjects, and in the assignment of treatments. The former may be helpful in allowing us to generalize what occurs among our subjects to what would occur in the general population, but the reality of most experimental settings is that a convenience or volunteer sample is used. Most likely the blood pressure study described above would use volunteer subjects. The important thing is to make sure these subjects are randomly assigned to one of the nine treatment combinations.
  • In order to gain optimal information about individuals in all the various treatment groups, we would like to make assignments not just randomly, but also evenly. If there are 90 subjects in the blood pressure study described above, and 9 possible treatment groups, then each group should be filled randomly with 10 individuals. A simple random sample of 10 could be taken from the larger group of 90, and those individuals would be assigned to the first treatment group. Next, the second treatment group would be filled by a simple random sample of 10 taken from the remaining 80 subjects. This process would be repeated until all 9 groups are filled with 10 individuals each.
Did I Get This?: Experiments #2

Modifications to Randomization

In some cases, an experiment’s design may be enhanced by relaxing the requirement of total randomization and blocking the subjects first, dividing them into groups of individuals who are similar with respect to an outside variable that may be important in the relationship being studied. This can help ensure that the effect of treatments, as well as background variables, are most precisely measured. In blocking, we simply split the sampled subjects into blocks based upon the different values of the background variable, and then randomly allocate treatments within each block. Thus, blocking in the assignment of subjects is analogous to stratification in sampling.

For example, consider again our experiment examining the differences between three versions of software from the last Learn By Doing activity. If we suspected that gender might affect individuals’ software preferences, we might choose to allocate subjects to separate blocks, one for males and one for females. Within each block, subjects are randomly assigned to treatments and the treatment proceeds as usual. A diagram of blocking in this situation is below:

We have 2 blocks, 3 treatment groups each (by random assignment). From the population we generate a sample. This sample of individuals is then split into two blocks, Males and Females. Each block is then randomly split further into the three treatment groups: "tt1: existing software," "ttt2 new software 1," and "ttt3 new software 2." So, we end up with 6 total groups. Within each group the responses from the treatment groups are compared to each other, generating results separately for each block.


Suppose producers of gasoline want to compare which of two types of gas results in better mileage for automobiles. In case the size of the vehicle plays a role in the effectiveness of different types of gasoline, they could first block by vehicle size, then randomly assign some cars within each block to Gasoline A and others to Gasoline B:

This example consists of 2 blocks, 2 treatment groups each (by random assignment). From the population we generate a sample, then separate it into two blocks, "Small" and "Large," according to the vehicle size.; Within these blocks we randomly assign vehicles to use either Gasoline A or Gasoline B (So, each block is split into two treatment groups, "ttt1: Gasoline A", and "ttt2: Gasoline B"), resulting in 4 total groups. Then, within each block, we compare the responses, so we obtain results for each block individually.

In the extreme, researchers may examine a relationship for a sample of blocks of just two individuals who are similar in many important respects, or even the same individual whose responses are compared for two explanatory values.


For example, researchers could compare the effects of Gasoline A and Gasoline B when both are used on the same car, for a sample of many cars of various sizes and models.

In this Matched Pairs Design we have n blocks of individual cars, with 2 treatment groups each, done by random assignment. From the population we generate the sample group. The sample group is then placed into n blocks for each individual car. Each of these blocks is subjected to two treatments by random assignment. These treatments are "ttt1 Gasoline A" and "ttt2 Gasoline B." For each car, the responses to each treatment are compared, resulting in a treatment for each

Such a study design, called matched pairs, may enable us to pinpoint the effects of the explanatory variable by comparing responses for the same individual under two explanatory values, or for two individuals who are as similar as possible except that the first gets one treatment, and the second gets another (or serves as the control). Treatments should usually be assigned at random within each pair, or the order of treatments should be randomized for each individual. In our gasoline example, for each car the order of testing (Gasoline A first, or Gasoline B first) should be randomized.


Suppose researchers want to compare the relative merits of toothpastes with and without tartar control ingredients. In order to make the comparison between individuals who are as similar as possible with respect to background and diet, they could obtain a sample of identical twins. One of each pair would randomly be assigned to brush with the tartar control toothpaste, while the other would brush with regular toothpaste of the same brand. These would be provided in unmarked tubes, so that the subjects would be blind. To make the experiment double-blind, dentists who evaluate the results would not know who used which toothpaste.

Paired Design. There are n blocks, each represented by a circle with two identical twins in them. Randomly, the treatment of tartar or regular toothpaste is given to each twin. So, each circle has two twins, two types of toothpaste, and each twin randomly gets assigned one type of toothpaste.

“Before-and-after” studies are another common type of matched pairs design. For each individual, the response variable of interest is measured twice: first before the treatment, then again after the treatment. The categorical explanatory variable is which treatment was applied, or whether a treatment was applied, to that participant.


  • We have explained data production as a two-stage process: first obtain the sample, then evaluate the variables of interest via an appropriate study design. Even though the steps are carried out in this order chronologically, it is generally best for researchers to decide on a study design before they actually obtain the sample. For the toothpaste example above, researchers would first decide to use the matched pairs design, then obtain a sample of identical twins, then carry out the experiment and assess the results.

These examples should convince you that, depending on the variables of interest, researching their relationship via an experiment may be too unrealistic, unethical, or impractical. Observational studies are subject to flaws, but often they are the only recourse.

Did I Get This?: More About Experiments