Binomial Random Variables – Part A (4:53)

- Slides 1-5
- Binomial Properties
- Binomial Formula
- Examples – Are these Binomial?

Binomial Random Variables – Part B (4:01)

- Slides 6-11
- How the Binomial Formula Works
- Calculations and Probability Distribution in Example A

Binomial Random Variables – Part C (1:56)

- Slides 12-14
- Online Binomial Calculators

Binomial Random Variables – Part D (2:02)

- Slides 15-17
- Formulas for the Mean, Variance, and Standard Deviation of a Binomial Distribution
- Calculations of Mean, Variance, and Standard Deviation in Example A

This document linked from Binomial Random Variables

]]>We have almost reached the end our discussion of probability. We were introduced to the important concept of **random variables**, which are quantitative variables whose value is determined by the outcome of a random experiment.

We discussed discrete and continuous random variables.

We saw that all the information about a **discrete random variable** is packed into its probability distribution. Using that, we can answer probability questions about the random variable and find its **mean and standard deviation**. We ended the part on discrete random variables by presenting a special class of discrete random variables – **binomial random variables.**

As we dove into **continuous random variables**, we saw how calculations can get complicated very quickly, when probabilities associated with a continuous random variable are found by calculating **areas under its density curve**.

As an example for a continuous random variable, we presented the **normal random variable**, and discussed it at length. The normal distribution is extremely important, not just because many variables in real life follow the normal distribution, but mainly because of the important role it plays in statistical inference, our ultimate goal of this course.

We learned how we can avoid calculus by using the **standard normal calculator or table** to find probabilities associated with the normal distribution, and learned how it can be used as an **approximation to the binomial** distribution under certain conditions.

A random variable is a variable whose values are numerical results of a random experiment.

- A
**discrete random variable**is summarized by its probability distribution — a list of its possible values and their corresponding probabilities.

The sum of the probabilities of all possible values must be 1.

The probability distribution can be represented by a table, histogram, or sometimes a formula.

- The
**probability distribution**of a random variable can be supplemented with numerical measures of the center and spread of the random variable.

**Center:** The center of a random variable is measured by its mean (which is sometimes also referred to as the **expected value**).

The mean of a random variable can be interpreted as its long run average.

The mean is a weighted average of the possible values of the random variable weighted by their corresponding probabilities.

**Spread:** The spread of a random variable is measured by its variance, or more typically by its standard deviation (the square root of the variance).

The standard deviation of a random variable can be interpreted as the typical (or long-run average) distance between the value that the random variable assumes and the mean of X.

- The binomial random variable is a type of discrete random variable that is quite common.

- The binomial random variable is defined in a random experiment that consists of n independent trials, each having two possible outcomes (called “success” and “failure”), and each having the same probability of success: p. Such a random experiment is called the binomial random experiment.

- The binomial random variable represents the number of successes (out of n) in a binomial experiment. It can therefore have values as low as 0 (if none of the n trials was a success) and as high as n (if all n trials were successes).

- There are “many” binomial random variables, depending on the number of trials (n) and the probability of success (p).

- The probability distribution of the binomial random variable is given in the form of a formula and can be used to find probabilities. Technology can be used as well.

- The mean and standard deviation of a binomial random variable can be easily found using short-cut formulas.

The probability distribution of a continuous random variable is represented by a probability density curve. The probability that the random variable takes a value in any interval of interest is the area above this interval and below the density curve.

An important example of a continuous random variable is the **normal random variable**, whose probability density curve is symmetric (bell-shaped), bulging in the middle and tapering at the ends.

- There are “many” normal random variables, each determined by its mean
*μ*(mu) (which determines where the density curve is centered) and standard deviation σ (sigma) (which determines how spread out (wide) the normal density curve is).

- Any normal random variable follows the Standard Deviation Rule, which can help us find probabilities associated with the normal random variable.

- Another way to find probabilities associated with the normal random variable is using the standard normal table. This process involves finding the z-score of values, which tells us how many standard deviations below or above the mean the value is.

- An important application of the normal random variable is that it can be used as an approximation of the binomial random variable (under certain conditions). A continuity correction can improve this approximation.

This document is linked from Binomial Random Variables.

]]>A Gallup Poll in May 2004 estimated that roughly 70% of U.S. adults are in favor of the death penalty for a person convicted of murder. A random sample of 750 U.S. adults is chosen. Let X be the number of adults (out of 750) who favor the death penalty.

This document is linked from Binomial Random Variables.

]]>In vitro fertilization is becoming more and more common these days. Suppose each embryo that is implanted has a 20% chance of resulting in a pregnancy that results in delivering a baby. Also, assume that each embryo’s chance of surviving and resulting in a baby is independent of the others.

It is an expensive procedure, so we want to do it only once. We wish to try to find the optimum number of embryos to implant so that the probability of at least 1 child being born is high, but the probability of more than 2 children being born is low. In other words, we want a baby, and we’re willing to have twins, but we don’t want triplets, quadruplets, etc.

Note that unlike the airline flight example, where we needed to control only one probability, in this case there are two probabilities that we wish to control.

The two conditions we’ve outlined mean that we’ll need two probabilities. These are:

- the probability of having at least one child is P(X ≥ 1) which can be found directly or by calculating 1 – P(X = 0)

- the probability of having more than two children is P(X > 2) which can be found directly or by calculating 1 – P(X ≤ 2)

We will let X represent the number of implanted embryos resulting in a baby. It is a binomial random variable with n = number of implanted embryos and p = .20 (the probability that an implanted embryo results in a baby).

It is customary to implant between n = 1 and n = 7 embryos. We have provided a table that contains the two probabilities mentioned in the previous question, for values of n ranging from 1 to 7.

**Note: This table does NOT represent ONE probability distribution**, it is simply a convenient way to summarize the information needed to answer this question.

**We actually needed to use 7 different probability distributions based upon n = the number of embryos.** For each of these distributions, we calculated two probabilities. Check some of these answers for yourself using the online calculator.

# embryos | P(X ≥ 1) | P(X > 2) |
---|---|---|

1 | 0.20 | ≈0 |

2 | 0.36 | ≈0 |

3 | 0.488 | 0.008 |

4 | 0.590 | 0.027 |

5 | 0.672 | 0.058 |

6 | 0.738 | 0.099 |

7 | 0.790 | 0.148 |

This document is linked from Binomial Random Variables.

]]>A multiple choice test has 10 questions, each with 5 possible answers, only one of which is correct. A student who did not study is absolutely clueless, and therefore uses an independent random guess to answer each of the 10 questions.

**Let X be the number of questions the student gets right.**

Give the formula expression for the probability distribution of X. In other words, apply the general formula for the probability distribution of a binomial random variable to the case in which n = 10 and p = 0.2.

What is the probability that the student gets exactly 4 questions right, P(X = 4)?

Applying the binomial formula is a good way for “first-timers” to understand the mechanics of binomial probabilities. Once you have mastered the technique, however, it may still be tedious to perform the necessary calculations.

For example, if I would ask you: What is the probability that the student will get at most 4 questions right? Or in other words, if we wanted to find P(X ≤ 4), we would need to add P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4).

For each of these 5 probabilities, we would need to use the formula, and then add the probabilities. This is very tedious. When calculations involve large n values, calculations become tedious as well. Luckily, any statistical software will do binomial calculations for us. In our course, we will use an online calculator for this purpose.

As practice, follow these steps to find P(X = 4) for our example (where n = 10 and p = 0.2), and verify that you get the same answer as you did in the last question, where you did it “by hand.”

Now use the online calculator to find the probability that the student gets no more than 4 questions right: P(X ≤ 4).

Use the online calculator to find the probability that the student gets more than 2 questions right, P(X > 2).

This document is linked from Binomial Random Variables.

]]>

This document is linked from Binomial Random Variables.

]]>In the following random experiment, decide whether the random variable X is binomial or not.

This document is linked from Binomial Random Variables.

]]>So far, in our discussion about discrete random variables, we have been introduced to:

- The probability distribution, which tells us which values a variable takes, and how often it takes them.
- The mean of the random variable, which tells us the long-run average value that the random variable takes.
- The standard deviation of the random variable, which tells us a typical (or long-run average) distance between the mean of the random variable and the values it takes.

We will now introduce a special class of discrete random variables that are very common, because as you’ll see, they will come up in many situations – **binomial random variables.**

Here’s how we’ll present this material.

- First, we’ll explain what kind of random experiments give rise to a binomial random variable, and how the binomial random variable is defined in those types of experiments.
- We’ll then present the probability distribution of the binomial random variable, which will be presented as a formula, and explain why the formula makes sense.
- We’ll conclude our discussion by presenting the mean and standard deviation of the binomial random variable.

As we just mentioned, we’ll start by describing what kind of random experiments give rise to a binomial random variable. We’ll call this type of random experiment a “binomial experiment.”

Binomial experiments are random experiments that consist of a fixed number of repeated trials, like tossing a coin 10 times, randomly choosing 10 people, rolling a die 5 times, etc.

These trials, however, need to be **independent** in the sense that the outcome in one trial has no effect on the outcome in other trials.

In each of these repeated trials there is one outcome that is of interest to us (we call this outcome “success”), and each of the trials is identical in the sense that the probability that the trial will end in a “success” is the same in each of the trials.

So for example, if our experiment is tossing a coin 10 times, and we are interested in the outcome “heads” (our “success”), then this will be a binomial experiment, since the 10 trials are independent, and the probability of success is 1/2 in each of the 10 trials.

Let’s summarize and give more examples.

The **requirements** for a random experiment to be a **binomial experiment** are:

- a fixed number (n) of trials
- each trial must be independent of the others
- each trial has just two possible outcomes, called “
**success**” (the outcome of interest) and “**failure**“ - there is a constant
**probability (p) of success**for each trial, the complement of which is the**probability (1 – p) of failure, sometimes denoted as q = (1 – p)**

In binomial random experiments, the number of successes in n trials is random.

It can be as low as 0, if all the trials end up in failure, or as high as n, if all n trials end in success.

The random variable X that represents the number of successes in those n trials is called a **binomial **random variable, and is determined by the values of n and p. We say, “X is binomial with n = … and p = …”

Let’s consider a few random experiments.

In each of them, we’ll decide whether the random variable is binomial. If it is, we’ll determine the values for n and p. If it isn’t, we’ll explain why not.

**Example A: **

A fair coin is flipped 20 times; X represents the number of heads.

**X is binomial with n = 20 and p = 0.5**.

**Example B: **

You roll a fair die 50 times; X is the number of times you get a six.

**X is binomial with n = 50 and p = 1/6**.

**Example C: **

Roll a fair die repeatedly; X is the number of rolls it takes to get a six.

**X is not binomial, because the number of trials is not fixed**.

**Example D: **

Draw 3 cards at random, one after the other, **without replacement**, from a set of 4 cards consisting of one club, one diamond, one heart, and one spade; X is the number of diamonds selected.

**X is not binomial, because the selections are not independent**. (The probability (p) of success is not constant, because it is affected by previous selections.)

**Example E: **

Draw 3 cards at random, one after the other, **with replacement**, from a set of 4 cards consisting of one club, one diamond, one heart, and one spade; X is the number of diamonds selected. Sampling with replacement ensures independence.

**X is binomial with n = 3 and p = 1/4**

**Example F: **

Approximately 1 in every 20 children has a certain disease. Let X be the number of children with the disease out of a random sample of 100 children. Although the children are sampled without replacement, it is assumed that we are sampling from such a vast population that the selections are virtually independent.

**X is binomial with n = 100 and p = 1/20 = 0.05**.

**Example G: **

The probability of having blood type B is 0.1. Choose 4 people at random; X is the number with blood type B.

**X is binomial with n = 4 and p = 0.1**.

**Example H: **

A student answers 10 quiz questions completely at random; the first five are true/false, the second five are multiple choice, with four options each. X represents the number of correct answers.

**X is not binomial, because p changes from 1/2 to 1/4**.

**Comments:**

**Example D**above was not binomial because sampling without replacement resulted in dependent selections.- In particular, the probability of the second card being a diamond is very dependent on whether or not the first card was a diamond:
- the probability is 0 if the first card was a diamond, 1/3 if the first card was not a diamond.

- In contrast,
**Example E**was binomial because sampling with replacement resulted in independent selections:- the probability of any of the 3 cards being a diamond is 1/4 no matter what the previous selections have been.

- On the other hand, when you take a relatively small random sample of subjects from a large population, even though the sampling is without replacement, we can assume independence because the mathematical effect of removing one individual from a very large population on the next selection is negligible.
- For example, in
**Example F**, we sampled 100 children out of the population of all children. - Even though we sampled the children without replacement, whether one child has the disease or not really has no effect on whether another child has the disease or not.
- The same is true for
**Example (G.)**.

- For example, in

Now that we understand what a binomial random variable is, and when it arises, it’s time to discuss its probability distribution. We’ll start with a simple example and then generalize to a formula.

Consider a regular deck of 52 cards, in which there are 13 cards of each suit: hearts, diamonds, clubs and spades. We select 3 cards at random **with replacement**. Let X be the number of diamond cards we got (out of the 3).

We have 3 trials here, and they are independent (since the selection is with replacement). The outcome of each trial can be either success (diamond) or failure (not diamond), and the probability of success is 1/4 in each of the trials.

X, then, is binomial with n = 3 and p = 1/4.

Let’s build the probability distribution of X as we did in the chapter on probability distributions. Recall that we begin with a table in which we:

- record all possible outcomes in 3 selections, where each selection may result in success (a diamond, D) or failure (a non-diamond, N).
- find the value of X that corresponds to each outcome.
- use simple probability principles to find the probability of each outcome.

With the help of the addition principle, we condense the information in this table to construct the actual probability distribution table:

In order to establish a general formula for the probability that a binomial random variable X takes any given value x, we will look for patterns in the above distribution. From the way we constructed this probability distribution, we know that, in general:

Let’s start with the second part, the probability that there will be x successes out of 3, where the probability of success is 1/4.

Notice that the fractions multiplied in each case are for the probability of x successes (where each success has a probability of p = 1/4) and the remaining (3 – x) failures (where each failure has probability of 1 – p = 3/4).

So in general:

Let’s move on to talk about the number of possible outcomes with x successes out of three. Here it is harder to see the pattern, so we’ll give the following mathematical result.

Consider a random experiment that consists of n trials, each one ending up in either success or failure. The number of possible outcomes in the sample space that have exactly k successes out of n is:

The notation on the left is often read as “n choose k.” Note that n! is read “n factorial” and is defined to be the product 1 * 2 * 3 * … * n. 0! is defined to be 1.

You choose 12 male college students at random and record whether they have any ear piercings (success) or not. There are many possible outcomes to this experiment (actually, 4,096 of them!).

In how many of the possible outcomes of this experiment are there exactly 8 successes (students who have at least one ear pierced)?

There is no way that we would start listing all these possible outcomes. The result above comes to our rescue.

The result says that in an experiment like this, where you repeat a trial n times (in our case, we repeat it n = 12 times, once for each student we choose), the number of possible outcomes with exactly 8 successes (out of 12) is:

Let’s go back to our example, in which we have n = 3 trials (selecting 3 cards). We saw that there were 3 possible outcomes with exactly 2 successes out of 3. The result confirms this since:

In general, then

Putting it all together, we get that the probability distribution of X, which is binomial with n = 3 and p = 1/4 i

In general, the number of ways to get x successes (and n – x failures) in n trials is

Therefore, the probability of x successes (and n – x failures) in n trials, where the probability of success in each trial is p (and the probability of failure is 1 – p) is equal to the number of outcomes in which there are x successes out of n trials, times the probability of x successes, times the probability of n – x failures:

**Binomial Probability Formula for P(X = x)**

where x may take any value 0, 1, … , n.

Let’s look at another example:

The probability of having blood type A is 0.4. Choose 4 people at random and let X be the number with blood type A.

X is a binomial random variable with n = 4 and p = 0.4.

As a review, let’s first find the probability distribution of X the long way: construct an interim table of all possible outcomes in S, the corresponding values of X, and probabilities. Then construct the probability distribution table for X.

As usual, the addition rule lets us combine probabilities for each possible value of X:

Now let’s apply the formula for the probability distribution of a binomial random variable, and see that by using it, we get exactly what we got the long way.

Recall that the general formula for the probability distribution of a binomial random variable with n trials and probability of success p is:

In our case, X is a binomial random variable with n = 4 and p = 0.4, so its probability distribution is:

Let’s use this formula to find P(X = 2) and see that we get exactly what we got before.

Now let’s look at some truly practical applications of binomial random variables.

Past studies have shown that 90% of the booked passengers actually arrive for a flight. Suppose that a small shuttle plane has 45 seats. We will assume that passengers arrive independently of each other. (This assumption is not really accurate, since not all people travel alone, but we’ll use it for the purposes of our experiment).

Many times airlines “*overbook*” flights. This means that the airline sells more tickets than there are seats on the plane. This is due to the fact that sometimes passengers don’t show up, and the plane must be flown with empty seats. However, if they do overbook, they run the risk of having more passengers than seats. So, some passengers may be unhappy. They also have the extra expense of putting those passengers on another flight and possibly supplying lodging.

With these risks in mind, the airline decides to sell more than 45 tickets. If they wish to keep the probability of having more than 45 passengers show up to get on the flight to less than 0.05, how many tickets should they sell?

This is a binomial random variable that represents the number of passengers that show up for the flight. It has p = 0.90, and n to be determined.

Suppose the airline sells 50 tickets. Now we have n = 50 and p = 0.90. We want to know P(X > 45), which is 1 – P(X ≤ 45) = 1 – 0.57 or 0.43. Obviously, all the details of this calculation were not shown, since a statistical technology package was used to calculate the answer. This is certainly more than 0.05, so the airline must sell fewer seats.

If we reduce the number of tickets sold, we should be able to reduce this probability. We have calculated the probabilities in the following table:

# tickets sold | P(X > 45) |
---|---|

50 | 0.43 |

49 | 0.26 |

48 | 0.13 |

47 | 0.04 |

46 | 0.008 |

From this table, we can see that by selling 47 tickets, the airline can reduce the probability that it will have more passengers show up than there are seats to less than 5%.

Note: For practice in finding binomial probabilities, you may wish to verify one or more of the results from the table above.

Now that we understand how to find probabilities associated with a random variable X which is binomial, using either its probability distribution formula or software, we are ready to talk about the mean and standard deviation of a binomial random variable. Let’s start with an example:

Overall, the proportion of people with blood type B is 0.1. In other words, roughly 10% of the population has blood type B.

Suppose we sample 120 people at random. On average, how many would you expect to have blood type B?

The answer, 12, seems obvious; automatically, you’d multiply the number of people, 120, by the probability of blood type B, 0.1.

This suggests the general formula for finding the mean of a binomial random variable:

**Claim:**

If X is binomial with parameters n and p, then the **mean** or **expected value** of X is:

Although the formula for mean is quite intuitive, it is not at all obvious what the variance and standard deviation should be. It turns out that:

**Claim:**

If X is binomial with parameters n and p, then the **variance **and **standard deviation** of X are:

**Comments: **

- The binomial mean and variance are special cases of our general formulas for the mean and variance of any random variable. Clearly it is much simpler to use the “shortcut” formulas presented above than it would be to calculate the mean and variance or standard deviation from scratch.

- Remember, these “shortcut” formulas only hold in cases where you have a binomial random variable.

Suppose we sample 120 people at random. The number with blood type B should be about 12, give or take how many? In other words, what is the standard deviation of the number X who have blood type B?

Since n = 120 and p = 0.1,

In a random sample of 120 people, we should expect there to be about 12 with blood type B, give or take about 3.3.

Before we move on to continuous random variables, let’s investigate the shape of binomial distributions.