View Lecture Slides with Transcript – Population Means – Part 2

This document linked from Population Means – Part 2

View Lecture Slides with Transcript – Population Means – Part 1

This document linked from Population Means – Part 1

Transcript – Normal Random Variables

This document linked from Normal Random Variables

]]>We have almost reached the end our discussion of probability. We were introduced to the important concept of **random variables**, which are quantitative variables whose value is determined by the outcome of a random experiment.

We discussed discrete and continuous random variables.

We saw that all the information about a **discrete random variable** is packed into its probability distribution. Using that, we can answer probability questions about the random variable and find its **mean and standard deviation**. We ended the part on discrete random variables by presenting a special class of discrete random variables – **binomial random variables.**

As we dove into **continuous random variables**, we saw how calculations can get complicated very quickly, when probabilities associated with a continuous random variable are found by calculating **areas under its density curve**.

As an example for a continuous random variable, we presented the **normal random variable**, and discussed it at length. The normal distribution is extremely important, not just because many variables in real life follow the normal distribution, but mainly because of the important role it plays in statistical inference, our ultimate goal of this course.

We learned how we can avoid calculus by using the **standard normal calculator or table** to find probabilities associated with the normal distribution, and learned how it can be used as an **approximation to the binomial** distribution under certain conditions.

A random variable is a variable whose values are numerical results of a random experiment.

- A
**discrete random variable**is summarized by its probability distribution — a list of its possible values and their corresponding probabilities.

The sum of the probabilities of all possible values must be 1.

The probability distribution can be represented by a table, histogram, or sometimes a formula.

- The
**probability distribution**of a random variable can be supplemented with numerical measures of the center and spread of the random variable.

**Center:** The center of a random variable is measured by its mean (which is sometimes also referred to as the **expected value**).

The mean of a random variable can be interpreted as its long run average.

The mean is a weighted average of the possible values of the random variable weighted by their corresponding probabilities.

**Spread:** The spread of a random variable is measured by its variance, or more typically by its standard deviation (the square root of the variance).

The standard deviation of a random variable can be interpreted as the typical (or long-run average) distance between the value that the random variable assumes and the mean of X.

- The binomial random variable is a type of discrete random variable that is quite common.

- The binomial random variable is defined in a random experiment that consists of n independent trials, each having two possible outcomes (called “success” and “failure”), and each having the same probability of success: p. Such a random experiment is called the binomial random experiment.

- The binomial random variable represents the number of successes (out of n) in a binomial experiment. It can therefore have values as low as 0 (if none of the n trials was a success) and as high as n (if all n trials were successes).

- There are “many” binomial random variables, depending on the number of trials (n) and the probability of success (p).

- The probability distribution of the binomial random variable is given in the form of a formula and can be used to find probabilities. Technology can be used as well.

- The mean and standard deviation of a binomial random variable can be easily found using short-cut formulas.

The probability distribution of a continuous random variable is represented by a probability density curve. The probability that the random variable takes a value in any interval of interest is the area above this interval and below the density curve.

An important example of a continuous random variable is the **normal random variable**, whose probability density curve is symmetric (bell-shaped), bulging in the middle and tapering at the ends.

- There are “many” normal random variables, each determined by its mean
*μ*(mu) (which determines where the density curve is centered) and standard deviation σ (sigma) (which determines how spread out (wide) the normal density curve is).

- Any normal random variable follows the Standard Deviation Rule, which can help us find probabilities associated with the normal random variable.

- Another way to find probabilities associated with the normal random variable is using the standard normal table. This process involves finding the z-score of values, which tells us how many standard deviations below or above the mean the value is.

- An important application of the normal random variable is that it can be used as an approximation of the binomial random variable (under certain conditions). A continuity correction can improve this approximation.

The applet used in this video is no longer available.

Work to understand the idea – we are now looking at x-bar and p-hat as our “data” and in order to get multiple measurements, we need to repeat the entire sampling process exactly. We need to repeat this process of sampling and recording our statistic until we have as many values as we require.

In practice we don’t do this, we only look at one sample – but the THEORY of frequentist statistics relies on the statistician understanding what happens if we repeat the sampling process.

- Slides 1-4

- Slides 5-8

- Slides 9-12

- Slides 13-17

- Slides 18-26: Applet: Sampling Distribution for p-hat, the sample proportion

- Slides 27-34: Applet: Sampling Distribution for x-bar, the sample mean

- Slide 35 – Summary

This document is linked from Sampling Distributions.

]]>This document is linked from The Normal Shape.

]]>95% is the most commonly used level of confidence. However, we may wish to increase our level of confidence and produce an interval that’s almost certain to contain μ (mu). Specifically, we may want to report an interval for which we are 99% confident that it contains the unknown population mean, rather than only 95%.

Using the same reasoning as in the last comment, in order to create a 99% confidence interval for μ (mu), we should ask: There is a probability of 0.99 that any normal random variable takes values within how many standard deviations of its mean? The precise answer is 2.576, and therefore, a 99% confidence interval for μ (mu) is:

Another commonly used level of confidence is a 90% level of confidence. Since there is a probability of 0.90 that any normal random variable takes values within 1.645 standard deviations of its mean, the 90% confidence interval for μ (mu) is:

Let’s go back to our first example, the IQ example:

The IQ level of students at a particular university has an unknown mean (μ, mu) and known standard deviation σ (sigma) =15. A simple random sample of 100 students is found to have a sample mean IQ of 115 (x-bar). Estimate μ (mu) with a 90%, 95%, and 99% confidence interval.

A 90% confidence interval for μ (mu) is:

A 95% confidence interval for μ (mu) is:

A 99% confidence interval for μ (mu) is:

The purpose of this next activity is to give you guided practice at calculating and interpreting confidence intervals, and drawing conclusions from them.

Note from the previous example and the previous “Did I Get This?” activity, that the more confidence I require, the wider the confidence interval for μ (mu). The 99% confidence interval is wider than the 95% confidence interval, which is wider than the 90% confidence interval.

This is not very surprising, given that in the 99% interval we multiply the standard deviation of the statistic by 2.576, in the 95% by 2, and in the 90% only by 1.645. Beyond this numerical explanation, there is a very clear intuitive explanation and an important implication of this result.

Let’s start with the intuitive explanation. The more certain I want to be that the interval contains the value of μ (mu), the more plausible values the interval needs to include in order to account for that extra certainty. I am 95% certain that the value of μ (mu) is one of the values in the interval (112.1, 117.9). In order to be 99% certain that one of the values in the interval is the value of μ (mu), I need to include more values, and thus provide a wider confidence interval.

In our example, the **wider** 99% confidence interval (111, 119) gives us a **less precise** estimation about the value of μ (mu) than the narrower 90% confidence interval (112.5, 117.5), because the smaller interval ‘narrows-in’ on the plausible values of μ (mu).

The important practical implication here is that researchers must decide whether they prefer to state their results with a higher level of confidence or produce a more precise interval. In other words,

The price we have to pay for a higher level of confidence is that the unknown population mean will be estimated with less precision (i.e., with a wider confidence interval). If we would like to estimate μ (mu) with more precision (i.e. a narrower confidence interval), we will need to sacrifice and report an interval with a lower level of confidence.

So far we’ve developed the confidence interval for the population mean “from scratch” based on results from probability, and discussed the trade-off between the level of confidence and the precision of the interval. The price you pay for a higher level of confidence is a lower level of precision of the interval (i.e., a wider interval).

Is there a way to bypass this trade-off? In other words, is there a way to increase the precision of the interval (i.e., make it narrower) **without **compromising on the level of confidence? We will answer this question shortly, but first we’ll need to get a deeper understanding of the different components of the confidence interval and its structure.

We explored the confidence interval for μ (mu) for different levels of confidence, and found that in general, it has the following form:

where z* is a general notation for the multiplier that depends on the level of confidence. As we discussed before:

- For a 90% level of confidence, z* = 1.645

- For a 95% level of confidence, z* = 1.96

- For a 99% level of confidence, z* = 2.576

To start our discussion about the structure of the confidence interval, let’s denote

The confidence interval, then, has the form:

To summarize, we have

X-bar is the sample mean, the point estimator for the unknown population mean (μ, mu).

**m** is called the **margin of error**, since it represents the maximum estimation error for a given level of confidence.

For example, for a 95% confidence interval, we are 95% confident that our estimate will not depart from the true population mean by more than m, the margin of error and m is further made up of the product of two components:

Here is a summary of the different components of the confidence interval and its structure:

This structure: **estimate ± margin of error**, where the margin of error is further composed of the product of a confidence multiplier and the standard deviation of the statistic (or, as we’ll see, the standard error) is the general structure of all confidence intervals that we will encounter in this course.

Obviously, even though each confidence interval has the same components, the formula for these components is different from confidence interval to confidence interval, depending on what unknown parameter the confidence interval aims to estimate.

Since the structure of the confidence interval is such that it has a margin of error on either side of the estimate, it is centered at the estimate (in our current case, x-bar), and its width (or length) is exactly twice the margin of error:

The margin of error, m, is therefore “in charge” of the width (or precision) of the confidence interval, and the estimate is in charge of its location (and has no effect on the width).

Let us now go back to the confidence interval for the mean, and more specifically, to the question that we posed at the beginning of the previous page:

Is there a way to increase the precision of the confidence interval (i.e., make it narrower) **without** compromising on the level of confidence?

Since the width of the confidence interval is a function of its margin of error, let’s look closely at the margin of error of the confidence interval for the mean and see how it can be reduced:

Since z* controls the level of confidence, we can rephrase our question above in the following way:

Is there a way to reduce this margin of error other than by reducing z*?

If you look closely at the margin of error, you’ll see that the answer is **yes.** We can do that by increasing the sample size n (since it appears in the denominator).

**Question :** Isn’t it true that another way to reduce the margin of error (for a fixed z*) is to reduce σ (sigma)?

**Answer: **While it is true that strictly mathematically speaking the smaller the value of σ (sigma), the smaller the margin of error, practically speaking we have absolutely no control over the value of σ (sigma) (i.e., we cannot make it larger or smaller). σ (sigma) is the population standard deviation; it is a fixed value (which here we assume is known) that has an effect on the width of the confidence interval (since it appears in the margin of error), but is definitely not a value we can change.

Let’s look at an example first and then explain why increasing the sample size is a way to increase the precision of the confidence interval **without **compromising on the level of confidence.

Recall the IQ example:

The IQ level of students at a particular university has an unknown mean (μ, mu) and a known standard deviation of σ (sigma) =15. A simple random sample of 100 students is found to have the sample mean IQ of 115 (x-bar).

For simplicity, in this question, we will round z* = 1.96 to 2. You should use z* = 1.96 in all problems unless you are specifically instructed to do otherwise.

A 95% confidence interval for μ (mu) in this case is:

Note that the margin of error is m = 3, and therefore the width of the confidence interval is 6.

Now, what if we change the problem slightly by increasing the sample size, and assume that it was 400 instead of 100?

In this case, a 95% confidence interval for μ (mu) is:

The margin of error here is only m = 1.5, and thus the width is only 3.

Note that for the same level of confidence (95%) we now have a narrower, and thus more precise, confidence interval.

Let’s try to understand why is it that a larger sample size will reduce the margin of error for a fixed level of confidence. There are three ways to explain this: mathematically, using probability theory, and intuitively.

We’ve already alluded to the **mathematical** explanation; the margin of error is

and since n, the sample size, appears in the denominator, increasing n will reduce the margin of error.

As we saw in our discussion about point estimates, **probability theory** tells us that:

This explains why with a larger sample size the margin of error (which represents how far apart we believe x-bar might be from μ (mu) for a given level of confidence) is smaller.

On an intuitive level, if our estimate x-bar is based on a larger sample (i.e., a larger fraction of the population), we have more faith in it, or it is more reliable, and therefore we need to account for less error around it.

**Comment:**

- While it is true that for a given level of confidence, increasing the sample size increases the precision of our interval estimation, in practice, increasing the sample size is not always possible.
- Consider a study in which there is a non-negligible cost involved for collecting data from each participant (an expensive medical procedure, for example). If the study has some budgetary constraints, which is usually the case, increasing the sample size from 100 to 400 is just not possible in terms of cost-effectiveness.
- Another instance in which increasing the sample size is impossible is when a larger sample is simply not available, even if we had the money to afford it. For example, consider a study on the effectiveness of a drug on curing a very rare disease among children. Since the disease is rare, there are a limited number of children who could be participants.

- This is the reality of statistics. Sometimes theory collides with reality, and you simply do the best you can.

As you will remember from a previous activity, the applet shows a normal-shaped distribution, which represents the **sampling distribution of the mean** (x-bar) for random samples of a particular fixed sample size, from a population with a fixed standard deviation (σ, sigma). The green line marks the value of the population mean (μ, mu).

To begin the simulation, click **“sample 25”** button. You have used the simulation to select 25 samples from the population; the applet has automatically computed the sample means and the corresponding confidence intervals.

Notice, along the left of the applet, that you can change the confidence level. Do this, and watch what happens to the intervals as the confidence is changed among all the levels available.

This document is linked from Population Means (Part 2).

]]>As the introduction mentioned, we’ll start our discussion on interval estimation with interval estimation for the population mean μ (mu). We’ll start by showing how a 95% confidence interval is constructed, and later generalize to other levels of confidence. We’ll also discuss practical issues related to interval estimation.

Recall the IQ example:

Suppose that we are interested in studying the IQ levels of students at Smart University (SU). In particular (since IQ level is a quantitative variable), we are interested in estimating μ (mu), the mean IQ level of all the students at SU.

We will assume that from past research on IQ scores in different universities, it is known that the IQ standard deviation in such populations is σ (sigma) = 15. In order to estimate μ (mu), a random sample of 100 SU students was chosen, and their (sample) mean IQ level is calculated (let’s assume, for now, that we have not yet found the sample mean).

We will now show the rationale behind constructing a 95% confidence interval for the population mean μ (mu).

- We learned in the “Sampling Distributions” section of probability that according to the central limit theorem, the sampling distribution of the sample mean x-bar is approximately normal with a mean of μ (mu) and standard deviation of σ/sqrt(n) = sigma/sqrt(n). In our example, then, (where σ (sigma) = 15 and n = 100), the possible values of x-bar, the sample mean IQ level of 100 randomly chosen students, is approximately normal, with mean μ (mu) and standard deviation 15/sqrt(100) = 1.5.
- Next, we recall and apply the Standard Deviation Rule for the normal distribution, and in particular its second part: There is a 95% chance that the sample mean we will find in our sample falls within 2 * 1.5 = 3 of μ (mu).

Obviously, if there is a certain distance between the sample mean and the population mean, we can describe that distance by starting at either value. So, if the sample mean (x-bar) falls within a certain distance of the population mean μ (mu), then the population mean μ (mu) falls within the same distance of the sample mean.

Therefore, the statement, “There is a 95% **chance** that the **sample** mean x-bar falls within 3 units of μ (mu)” can be rephrased as: “We are 95% **confident **that the **population** mean μ (mu) falls within 3 units of the x-bar we found in our sample.”

So, if we happen to get a sample mean of x-bar = 115, then we are 95% confident that μ (mu) falls within 3 units of 115, or in other words that μ (mu) is covered by the interval (115 – 3, 115 + 3) = (112,118).

(On later pages, we will use similar reasoning to develop a general formula for a confidence interval.)

**Comment:**

- Note that the first phrasing is about x-bar, which is a random variable; that’s why it makes sense to use probability language. But the second phrasing is about μ (mu), which is a parameter, and thus is a “fixed” value that does not change, and that’s why we should not use probability language to discuss it. In these problems, it is our x-bar that will change when we repeat the process, not μ (mu). This point will become clearer after you do the activities which follow.

Let’s generalize the IQ example. Suppose that we are interested in estimating the unknown population mean (μ, mu) based on a random sample of size n. Further, we assume that the population standard deviation (σ, sigma) is known.

The values of x-bar follow a normal distribution with (unknown) mean μ (mu) and standard deviation σ/sqrt(n)=sigma/sqrt(n) (known, since both σ, sigma, and n are known). In the standard deviation rule, we stated that approximately 95% of values fall within 2 standard deviations of μ (mu). From now on, we will be a little more precise and use the standard normal table to find the exact value for 95%.

Our picture is as follows:

Try using the applet in the post for **Learn by Doing – Normal Random Variables **to find the cutoff illustrated above.

We can also verify the z-score using a calculator or table by finding the z-score with the area of 0.025 to the left (which would give us -1.96) or with the area to the left of 0.975 = 0.95 + 0.025 (which would give us +1.96).

Thus, there is a 95% chance that our sample mean x-bar will fall within 1.96*σ/sqrt(n) = 1.96*sigma/sqrt(n) of μ (mu).

Which means we are 95% confident that μ (mu) falls within 1.96*σ/sqrt(n) = 1.96*sigma/sqrt(n) of our sample mean x-bar.

Here, then, is the **general result:**

Suppose a random sample of size n is taken from a normal population of values for a quantitative variable whose mean (μ, mu) is unknown, when the standard deviation (σ, sigma) is given.

A **95% confidence interval (CI) for μ (mu) **is:

**Comment:**

- Note that for now we require the population standard deviation (σ, sigma) to be known. Practically, σ (sigma) is rarely known, but for some cases, especially when a lot of research has been done on the quantitative variable whose mean we are estimating (such as IQ, height, weight, scores on standardized tests), it is reasonable to assume that σ (sigma) is known. Eventually, we will see how to proceed when σ (sigma) is unknown, and must be estimated with sample standard deviation (s).

Let’s look at another example.

An educational researcher was interested in estimating μ (mu), the mean score on the math part of the SAT (SAT-M) of all community college students in his state. To this end, the researcher has chosen a random sample of 650 community college students from his state, and found that their average SAT-M score is 475. Based on a large body of research that was done on the SAT, it is known that the scores roughly follow a normal distribution with the standard deviation σ (sigma) =100.

Here is a visual representation of this story, which summarizes the information provided:

Based on this information, let’s estimate μ (mu) with a 95% confidence interval.

Using the formula we developed earlier

the 95% confidence interval for μ (mu) is:

We will usually provide information on how to round your final answer. In this case, one decimal place is enough precision for this scenario. You could also round to the nearest whole number without much loss of information here.

We are not done yet. An equally important part is to **interpret what this means in the context of the problem.**

We are 95% confident that the mean SAT-M score of all community college students in the researcher’s state is covered by the interval (467.3, 482.7). Note that the confidence interval was obtained by taking 475 ± 7.7. This means that we are 95% confident that by using the sample mean (x-bar = 475) to estimate μ (mu), our error is no more than 7.7 points.

You just gained practice computing and interpreting a confidence interval for a population mean. Note that the way a confidence interval is used is that we hope the interval contains the population mean μ (mu). This is why we call it an “interval **for the population mean**.”

The following activity is designed to help give you a better understanding of the underlying **reasoning** behind the interpretation of confidence intervals. In particular, you will gain a deeper understanding of why we say that we are “**95% confident** that the population mean is **covered** by the interval.”

We just saw that one interpretation of a 95% confidence interval is that we are 95% confident that the population mean (μ, mu) is contained in the interval. Another useful interpretation in practice is that, given the data, the confidence interval represents the set of plausible values for the population mean μ (mu).

As an illustration, let’s return to the example of mean SAT-Math score of community college students. Recall that we had constructed the confidence interval (467.3, 482.7) for the unknown mean SAT-M score for all community college students.

Here is a way that we can use the confidence interval:

Do the results of this study provide evidence that μ (mu), the mean SAT-M score of community college students, is lower than the mean SAT-M score in the general population of college students in that state (which is 480)?

The 95% confidence interval for μ (mu) was found to be (467.3, 482.7). Note that 480, the mean SAT-M score in the general population of college students in that state, falls inside the interval, which means that it is one of the plausible values for μ (mu).

This means that μ (mu) could be 480 (or even higher, up to 483), and therefore we cannot conclude that the mean SAT-M score among community college students in the state is lower than the mean in the general population of college students in that state. (Note that the fact that most of the plausible values for μ (mu) fall below 480 is not a consideration here.)

the 95% confidence interval for μ (mu) is:

We will usually provide information on how to round your final answer. In this case, one decimal place is enough precision for this scenario. You could also round to the nearest whole number without much loss of information here.

We are not done yet. An equally important part is to **interpret what this means in the context of the problem.**

We are 95% confident that the mean SAT-M score of all community college students in the researcher’s state is covered by the interval (467.3, 482.7). Note that the confidence interval was obtained by taking 475 ± 7.7. This means that we are 95% confident that by using the sample mean (x-bar = 475) to estimate μ (mu), our error is no more than 7.7 points.

As mentioned in the introduction, this last concept in probability is the bridge between the probability section and inference. It focuses on the relationship between sample values (**statistics**) and population values (**parameters**). Statistics vary from sample to sample due to **sampling variability**, and therefore can be regarded as **random variables** whose distribution we call the **sampling distribution**.

In our discussion of sampling distributions, we focused on two statistics, the **sample proportion**, p-hat and the **sample mean**, x-bar. Our goal was to explore the sampling distribution of these two statistics relative to their respective population parameters, p and μ (mu), and we found in **both** cases that under certain conditions the **sampling distribution is approximately normal**. This result is known as the **Central Limit Theorem.** As we’ll see in the next section, the Central Limit Theorem is the foundation for statistical inference.

A **parameter** is a number that describes the population, and a **statistic** is a number that describes the sample.

- Parameters are fixed, and in practice, usually unknown.

- Statistics change from sample to sample due to sampling variability.

- The behavior of the possible values the statistic can take in repeated samples is called the
**sampling distribution**of that statistic.

- The following table summarizes the important information about the two sampling distributions we covered. Both of these results follow from the
**central limit theorem**which basically states that as the sample size increases, the distribution of the average from a sample of size n becomes increasingly normally distributed.