SLR – Inference & More Advanced Concepts

NOTE: Except in cases of complex calculations, we use brackets [ ] to indicate “functions of” and parentheses ( ) to indicate “multiplication.”

SE[Beta_1-hat] = the standard error of the estimated slope Beta_1-hat. There is no multiplication!

Beta_1-hat(AGE) = the multiplication of estimated slope Beta_1-hat and the variable AGE.


Introduction and Links to Materials

In this unit, we are discussing simple linear regression in more detail than we did in the pre-requisite course and begin using PROC GLM instead of PROC REG.

The output and code are extremely similar so please continue to review the following materials and tutorials from PHC 6052 as needed.

Review from 6052 Materials: 

SAS Tutorials:

Useful SAS Procedures

  • PROC GLM
  • PROC SGPLOT
  • PROC SGSCATTER
  • PROC REG
  • PROC CORR

Consider the following materials from Penn State STAT 501 as your textbook content for this material.

Review Penn State materials with a focus on the definitions, concepts, and interpretations. You do not need to understand the mathematical details or be able to calculate regression models by hand (although you are expected to be able to work with models and ANOVA tables requiring simple mathematical calculations).

PENN STATE STAT 501 Materials – required textbook reading for this material

Note: If you click on “Printer Friendly Version” in the main lesson page it will show all pages in that lesson. Here are the links for LESSON 2 (skip 2.9-2.11) and LESSON 3. The only downside is that interactive applets will not work in the printer friendly version. We will provide these next to the main lesson link in the future.

PHC 6053 Video (6:51)


Examples and Learn by Doing Activities

Now look at our examples and try to answer the questions we pose by reviewing the output provided.

These return to our random sample of 500 observations from the NHANES data.

Here our primary goal is to understand the predictors of Systolic Blood Pressure which is a broad and difficult task! Secondarily we would like to find a model to predict Systolic Blood Pressure but our goal is to interpret the parameter estimates and identify potential confounding variables, etc. as we begin working with this data for regression modeling.

EXAMPLE: NHANES DATA – Simple Linear Regression – SBP vs. AGE

  • Dataset: nh_500c.sas7bdat – To use the dataset, save the file into the folder on your computer which is associated with a SAS library. Once you do this, open SAS and you should be able to immediately access the file using that library and the file name.
  • SAS Code and Output: Unit2-SLR-01-NHANES-SBP-AGE.pdf

LEARN BY DOING

Answer the following using the regression results on pages 2 and 3 of the output.

  • For the model using AGE
    • Verify the calculation of R-squared from the sum of squares.
    • Verify the calculation of the F-value in the overall ANOVA table from the sum of squares and/or mean square values.
    • Verify the calculation of the t-statistic (for the test for the slope) from other values in the output.
  • For the model using AGE_C50 which is our CENTERED AGE variable – recall we centered at 50 by creating a variable from AGE-50.
    • Verify the calculation of R-squared from the sum of squares.
    • Verify the calculation of the F-value in the overall ANOVA table from the sum of squares and/or mean square values.
    • Verify the calculation of the t-statistic (for the test for the slope) from other values in the output.
    • Interpret the slope of the estimated regression model in context along with it’s confidence interval.
    • Is the intercept meaningful here? If so interpret in context along with it’s confidence interval.

Solution: Unit2-SLR-01-NHANES-SBP-AGE-SOLUTION2.pdf

Now we will look at the regression lines by gender.

EXAMPLE: NHANES DATA – Simple Linear Regression – SBP vs. AGE within each SEX.

  • Dataset: nh_500c.sas7bdat – To use the dataset, save the file into the folder on your computer which is associated with a SAS library. Once you do this, open SAS and you should be able to immediately access the file using that library and the file name.
  • SAS Code and Output: Unit2-SLR-02-NHANES-SBP-AGE-BY-SEX.pdf

LEARN BY DOING

NOTE: We used AGE_C50 as the predictor which is the AGE variable CENTERED at 50 (a new variable = AGE-50)

Answer the following using the regression results on pages 2 and 3 of the output.

  • For the model for FEMALES
    • Verify the calculation of R-squared from the sum of squares.
    • Verify the calculation of the F-value in the overall ANOVA table from the sum of squares and/or mean square values.
    • Verify the calculation of the t-statistic (for the test for the slope) from other values in the output.
    • Interpret the slope of the estimated regression model in context along with it’s confidence interval.
    • Is the intercept meaningful here? If so interpret in context along with it’s confidence interval.
  • For the model for MALES
    • Verify the calculation of R-squared from the sum of squares.
    • Verify the calculation of the F-value in the overall ANOVA table from the sum of squares and/or mean square values.
    • Verify the calculation of the t-statistic (for the test for the slope) from other values in the output.
    • Interpret the slope of the estimated regression model in context along with it’s confidence interval.
    • Is the intercept meaningful here? If so interpret in context along with it’s confidence interval.

Solution: Unit2-SLR-02-NHANES-SBP-AGE-BY-SEX-SOLUTION2.pdf