Discrete Random Variables
- Section Plan
- Probability Distributions
- Probability Histograms
- Area of a Probability Histogram
- Finding Probabilities
- Key Words
- Mean of a Discrete Random Variable
- Variance and Standard Deviation of a Discrete Random Variable
We begin with discrete random variables: variables whose possible values are a list of distinct values. In order to decide on some notation, let’s look at the coin toss example again:
A fair coin is tossed twice.
- Let the random variable X be the number of tails we get in this random experiment.
- In this case, the possible values that X can assume are
- 0 (if we get HH),
- 1 (if get HT or TH),
- and 2 (if we get TT).
If we want to find the probability of the event “getting 1 tail,” we’ll write: P(X = 1)
If we want to find the probability of the event “getting 0 tails,” we’ll write: P(X = 0)
Many students prefer the second notation as keeping track of the difference between X and x can cause confusion.
- Here the X represents the random variable and x or k denote the value of interest in the current problem (0, 1, etc. ).
- Note that for the random variables we’ll use a capital letter, and for the value we’ll use a lowercase letter.
The way this section on discrete random variables is organized is very similar to the way we organized our discussion about one quantitative variable in the Exploratory Data Analysis unit.
It will be separated into four sections.
- We’ll first discuss the probability distribution of a discrete random variable, ways to display it, and how to use it in order to find probabilities of interest.
- We’ll then move on to talk about the mean and standard deviation of a discrete random variable, which are measures of the center and spread of its distribution.
- We’ll conclude this part by discussing a special and very common class of discrete random variable: the binomial random variable.
When we learned how to find probabilities by applying the basic principles, we generally focused on just one particular outcome or event, like the probability of getting exactly one tail when a coin is tossed twice, or the probability of getting a 5 when a die is rolled.
Now that we have mastered the solution of individual probability problems, we’ll proceed to look at the big picture by considering all the possible values of a discrete random variable, along with their associated probabilities.
This list of possible values and probabilities is called the probability distribution of the random variable.
- In the Exploratory Data Analysis unit of this course, we often looked at the distribution of sample values in a quantitative data set. We would display the values with a histogram, and summarize them by reporting their mean.
- In this section, when we look at the probability distribution of a random variable, we consider all its possible values and their overall probabilities of occurrence.
- Thus, we have in mind an entire population of values for a variable. When we display them with a histogram or summarize them with a mean, these are representing a population of values, not a sample.
- The distinction between sample and population is an essential concept in statistics, because an ultimate goal is to draw conclusions about unknown values for a population, based on what is observed in the sample.
In the examples which follow we will sometimes illustrate how the probability distribution is created.
We do this to demonstrate the usefulness of the probability rules we previously discussed and to illustrate clearly how probability distributions can be created.
As we are more focused on data driven methods, you will often be given a probability distribution based upon data as opposed to constructing the theoretical probability distribution based upon flipping coins or similar classical probability experiments.
Recall our first example, when we introduced the idea of a random variable. In this example we tossed a coin twice.
The way to interpret this table is:
- X takes the values 0, 1, 2 and P(X = 0) = 1/4, P(X = 1) = 1/2, P(X = 2) = 1/4.
Note that events of the type (X = x) are subject to the principles of probability established earlier, and will provide us with a way of systematically exploring the behavior of random variables.
In particular, the first two principles in the context of probability distributions of random variables will now be stated.
The probability distribution for two flips of a coin was simple enough to construct at once.
For more complicated random experiments, it is common to first construct a table of all the outcomes and their probabilities, then use the addition principle to condense that information into the actual probability distribution table.
In the previous two examples, we needed to specify the probability distributions ourselves, based on the physical circumstances of the situation.
In some situations, the probability distribution may be specified with a formula.
Such a formula must be consistent with the constraints imposed by the laws of probability, so that the probability of each outcome must be between 0 and 1, and the probabilities of all possible outcomes together must sum to 1.
We will see this with the binomial distribution.
We learned to display the distribution of sample values for a quantitative variable with a histogram in which the horizontal axis represented the range of values in the sample.
- The vertical axis represented the frequency or relative frequency (sometimes given as a percentage) of sample values occurring in that interval.
- The width of each rectangle in the histogram was an interval, or part of the possible values for the quantitative variable.
- The height of each rectangle was the frequency (or relative frequency) for that interval.
Similarly, we can display the probability distribution of a random variable with a probability histogram.
- The horizontal axis represents the range of all possible values of the random variable
- The vertical axis represents the probabilities of those values.
Here an example of a probability histogram.
(Such probabilities are not always increasing; they just happen to be so in this example).
Notice that each rectangle in the histogram has a width of 1 unit. The height of each rectangle is the probability that it will occur.
Thus, the area of each rectangle is base times height, which for these rectangles is 1 times its probability for each value of X.
For probability distributions of discrete random variables, this is equivalent to the property that the sum of all of the probabilities must equal 1.
We’ve seen how probability distributions are created. Now it’s time to use them to find probabilities.
As you just saw in this example, we need to pay attention to the wording of the probability question.
The key words that told us which values to use for X are more than.
The following will clarify and reinforce the key words and their meanings.
Let’s begin with some everyday situations using at least and at most.
Suppose someone said to you, “I need you to write at least 10 pages for a term paper.”
- What does this mean?
- It means that 10 pages is the smallest amount you are going to write.
- In other words, you will write 10 or morepages for the term paper.
- This would be the same as saying, “not less than10 pages.”
- So, for example, writing 9 pages would be unacceptable.
On the other hand, suppose you are considering the number of children you will have. You want at most 3 children.
- This means that 3 children is the most that you wish to have.
- In other words, you will have 3 or fewer
- This would be the same as saying, “not more than3 children.”
- So, for example, you would not want to have 4 children.
The following table gives a list of some key words to know.
Suppose a random variable X had possible values of 0 through 5.
|Key Words||Meaning||Symbols||Values for X|
|more than 2||strictly larger than 2||X > 2||3, 4, 5|
|no more than 2||2 or fewer||X ≤ 2||0, 1, 2|
|fewer than 2||strictly smaller than 2||X < 2||0, 1|
|no less than 2||2 or more||X ≥ 2||2, 3, 4, 5|
|at least 2||2 or more||X ≥ 2||2, 3, 4, 5|
|at most 2||2 or fewer||X ≤ 2||0, 1, 2|
|exactly 2||2, no more or no less, only 2||X = 2||2|
Before we move on to the next section on the means and variances of a probability distribution, let’s revisit the changing majors example:
After we learn about means and standard deviations, we will have another way to answer these types of questions.
In the Exploratory Data Analysis (EDA) section, we displayed the distribution of one quantitative variable with a histogram, and supplemented it with numerical measures of center and spread.
This section will be devoted to introducing these measures. As before, we’ll start with the numerical measure of center, the mean. Let’s begin by revisiting an example we saw in EDA.
In Exploratory Data Analysis, we used the mean of a sample of quantitative values—their arithmetic average—to tell the center of their distribution. We also saw how a weighted mean was used when we had a frequency table. These frequencies can be changed to relative frequencies.
Let’s see how this is done by looking at a specific example.
Here is the general definition of the mean of a discrete random variable:
Although “expected value” is a common, and even preferred term in the field of statistics, this expression may be somewhat misleading, because in many cases it is impossible for a random variable to actually equal its expected value.
For example, the mean number of goals for a World Cup soccer game is 2.36. But we can never expect any single game to result in 2.36 goals, since it is not possible to score a fraction of a goal. Rather, 2.36 is the long-run average of all World Cup soccer games.
In the case of Xavier’s production line, the mean number of defective parts produced in an hour is 1.8. But the actual number of defective parts produced in any given hour can never equal 1.8, since it must take whole number values.
To get a better feel for the mean of a random variable, let’s extend the defective parts example:
In Exploratory Data Analysis, we used the mean of a sample of quantitative values (their arithmetic average, x-bar) to tell the center of their distribution, and the standard deviation (s) to tell the typical distance of sample values from their mean.
We described the center of a probability distribution for a random variable by reporting its mean which we denoted by the Greek letter mu.
Now we would like to establish an accompanying measure of spread.
Our measure of spread will still report the typical distance of values from their means, but in order to distinguish the spread of a population of all of a random variable’s values from the spread (s) of sample values, we will denote the standard deviation of the random variable X with the Greek lower case “sigma,” and use a subscript to remind us what is the variable of interest (there may be more than one in later problems):
We will also focus more frequently than before on the squared standard deviation, called the variance, because some important rules we need to invoke are in terms of variance rather than standard deviation.
Here is the formal definition:
The purpose of the next activity is to give you better intuition about the mean and standard deviation of a random variable.
When we compare distributions, the distribution in which it is more likely to find values that are further from the mean will have a larger standard deviation.
Likewise, the distribution in which it is less likely to find values that are further from the mean will have the smaller standard deviation.
As we have stated before, using the mean and standard deviation gives us another way to assess which values of a random variable are unusual.
For reasonably symmetric distributions, any values of a random variable that fall within 2 or 3 standard deviations of the mean would be considered ordinary (not unusual).
For any distribution, it is unusual for values to fall outside of 3 or 4 standard deviations – depending on your definition of “unusual.”
The following activity will reinforce this idea.