Unit 3B: Random Variables
In the remaining sections in Unit 3 we will begin to make the connection between probability and statistics so that we can apply these concepts in the final Unit on statistical inference.
These concepts bridge the gap between the mathematics of descriptive statistics and probability and true “Inferential Statistics” where we will formalize statistical hypothesis tests.
In other words, the topics in Unit 3B provide the mathematical background and concepts that will be needed for our study of inferential statistics.
In the previous sections we learned principles and tools that help us find probabilities of events in general.
Now that we’ve become proficient at doing that, we’ll talk about random variables.
Just like any other variable, random variables can take on multiple values.
The probabilities for the values can be determined by theoretical or observational means.
Such probabilities play a vital role in the theory behind statistical inference, our ultimate goal in this course.
We first discussed variables in the Exploratory Data Analysis portion of the course. A variable is a characteristic of an individual.
We also made an important distinction between categorical variables, whose values are groups or categories (and an individual can be placed into one of them), and quantitative variables, which have numerical values for which arithmetic operations make sense.
In the previous sections, we focused mostly on events which arise when there is a categorical variable in the background: blood type, pierced ears (yes/no), gender, on time delivery (yes/no), side effect (yes/no), etc.
Now we will begin to consider quantitative variables that arise when a random experiment is performed. We will need to define this new type of variable.
A random variable can be thought of as a function that associates exactly one of the possible numerical outcomes to each trial of a random experiment. However, that number can be the same for many of the trials.
Before we go any further, here are some simple examples:
Note that if we had tossed a coin three times, the possible values for the number of tails would be 0, 1, 2, or 3. In general, if we toss a coin “n” times, the possible number of tails would be 0, 1, 2, 3, … , or n.
NOTE… We identified the first example as theoretical and the second as observational.
Let’s discuss the distinction.
- To answer probability questions about a theoretical situation, we only need the principles of probability.
- However, if we have an observational situation, the only way to answer probability questions is to use the relative frequency we obtain from a random sample.
Here is a different type of example:
What is the difference between the random variables in these examples? Let’s see:
- They all arise from a random experiment (tossing a coin twice, choosing a person at random, choosing a lightweight boxer at random).
- They are all quantitative (number of tails, number of ears, weight).
Where they differ is in the type of possible values they can take:
- In the first two examples, X has three distinct possible values: 0, 1, and 2. You can list them.
- In contrast, in the third example, X takes any value in the interval 130-135, and thus the possible values of X cover an infinite range of possibilities, and cannot be listed.
Just as the distinction between categorical and quantitative variables was important in Exploratory Data Analysis, the distinction between discrete and continuous random variables is important here, as each one gets a different treatment when it comes to calculating probabilities and other quantities of interest.
Before we go any further, a few observations about the nature of discrete and continuous random variables should be mentioned.
- Sometimes, continuous random variables are “rounded” and are therefore “in a discrete disguise.” For example:
- time spent watching TV in a week, rounded to the nearest hour (or minute)
- outside temperature, to the nearest degree
- a person’s weight, to the nearest pound.
Even though they “look like” discrete variables, these are still continuous random variables, and we will in most cases treat them as such.
- On the other hand, there are some variables which are discrete in nature, but take so many distinct possible values that it will be much easier to treat them as continuous rather than discrete.
- the IQ of a randomly chosen person
- the SAT score of a randomly chosen student
- the annual salary of a randomly chosen CEO, whether rounded to the nearest dollar or the nearest cent
- Sometimes we have a discrete random variable but do not know the extent of its possible values.
- For example: How many accidents will occur in a particular intersection this month?
- We may know from previously collected data that this number is from 0-5. But, 6, 7, or more accidents could be possible.
- A good rule of thumb is that discrete random variables are things we count, while continuous random variables are things we measure.
- We counted the number of tails and the number of ears with earrings. These were discrete random variables.
- We measured the weight of the lightweight boxer. This was a continuous random variable.
Often we can have a subject matter for which we can collect data that could involve a discrete or a continuous random variable, depending on the information we wish to know.
We devote a great deal of attention to random variables, since random variables and the probabilities that are associated with them play a vital role in the theory behind statistical inference, our ultimate goal in this course.
We’ll start with discrete random variables, including a discussion of binomial random variables and then move on to continuous random variables where we will formalize our understanding of the normal distribution.