# Unit 2: Simple Linear Regression: Inference in Case QQ

## Learning Objectives

Develop the theoretical simple linear regression model and discuss how it is estimated.

• Distinguish between a deterministic relationship and a statistical relationship.
• Explain the concept of the least squares criterion.
• Interpret the intercept b0 and slope b1 of an estimated regression equation.
• Be able to obtain the estimates b0 and b1 from output.
• Recognize the distinction between a population regression line and the estimated regression line.
• Write regression models for the population in two forms: for individual values of the response or for the mean response.
• Write estimated regression models using output.
• Summarize the four assumptions (conditions) that comprise the simple linear regression model.
• Explain what the unknown population variance σ2 quantifies in the regression setting.
• Be able to obtain the estimate MSE of the unknown population variance σ2 from output.
• Explain that the coefficient of determination (r2) and the correlation coefficient (r) are measures of linear association. That is, they can be 0 even if there is perfect nonlinear association.
• Be able to interpret the r2 value.
• Explain the cautions necessary in using the r2 value as a way of assessing the strength of the linear association.
• Be able to calculate the correlation coefficient r  and r2 values from one another.
• Explain what various correlation coefficient values mean. (There is no specific interpretation for the correlation coefficient as there is for the r2 value.)

Discuss how the simple linear regression model can be used including inferential methods.

• Be able to determine confidence intervals and conduct hypothesis tests for the population intercept β0 and population slope β1 using output.
• Be able to draw research conclusions about the population intercept β0 (if appropriate) and population slope β1 using the above confidence intervals and hypothesis tests.
• Explain the six possible outcomes about the slope β1 whenever we test whether there is a linear relationship between a predictor x and a response y.
• Explain the “derivation” of the analysis of variance F-test for testing H0: β1 = 0. That is, explain how the total variation in a response y is broken down into two parts — a component that is due to the predictor x and a component that is just due to random error. And, explain how the expected mean squares tell us to use the ratio MSR/MSE to conduct the test.
• Explain how each element of the analysis of variance table is calculated and be able to find the values of components given a partially complete ANOVA table.
• Explain what scientific questions can be answered with the analysis of variance F-test.
• Conduct the analysis of variance F-test to test H0: β1 = 0 versus HA: β1 ≠ 0.
• Explain the similarities and distinctions of the t-test and F-test for testing H0: β1 = 0.
• Explain that the t-test for testing that β1 = 0, the F-test for testing that β1 = 0, and the t-test for testing that ρ = 0 yield similar results. Explain when it makes sense to report the results of each one.
• Distinguish between estimating a mean response (confidence interval) and predicting a new observation (prediction interval).
• Explain the various factors that affect the width of a confidence interval for a mean response.
• Explain why a prediction interval for a new response is wider than the corresponding confidence interval for a mean response.
• Explain that the formula for a prediction interval depends strongly on the condition that the error terms are normally distributed, while the formula for the confidence interval is not so dependent on this condition for large samples.
• Explain the types of research questions that can be answered using the materials and methods of this lesson.

Discuss diagnostics for validating a simple linear regression model.

• Explain why we need to check the assumptions of our model.
• Explain the things that can go wrong with the linear regression model.
• Be able to detect various problems with the model using a residuals vs. fits plot.
• Be able to detect various problems with the model using a residuals vs. predictor plot.
• Be able to detect a certain kind of dependent error terms using a residuals vs. order plot. (It is somewhat rare that we can do this analysis in practice)
• Be able to detect non-normal error terms using a normal probability plot.

The following lessons from Penn State STAT 501 are linked in the materials as your textbook material for Unit 2.