# Categorical Predictors

**NOTE:** Except in cases of complex calculations, we use brackets [ ] to indicate “functions of” and parentheses ( ) to indicate “multiplication.”

**SE[Beta_1-hat]** = the standard error of the estimated slope Beta_1-hat. There is no multiplication!

**Beta_1-hat(AGE)** = the multiplication of estimated slope Beta_1-hat and the variable AGE.

- Introduction and Links to Materials
- PROC GLM
- PHC 6053 Videos (43:30)
- Examples and Learn by Doing Activities
**LEARN BY DOING:**One Binary Predictor**LEARN BY DOING:**Working With Models – Part 1

## Introduction and Links to Materials

In Unit 3 we will spend a significant amount of time covering the details regarding** multiple linear regression.** Many of the basic ideas are direct extensions of simple linear regression and many skills in interpretation and modeling will translate, at least partially, to other regression models we will cover and that you may learn in the future.

We will be using **PROC GLM** instead of PROC REG in this course although many analyses can be completed using PROC REG if you are willing to code all categorical variables manually. We will learn in this unit how PROC GLM can be used to correctly code categorical variables for us in our analyses which is the main reason we prefer PROC GLM for multiple linear regression in this course.

Useful **SAS Procedures**

- PROC GLM
- CLASS statement in PROC GLM
- Other procedures as needed for exploratory data analysis

Consider the following materials from **Penn State STAT 501** as support for our lecture materials for this content.

Review Penn State materials with a** focus on the definitions, concepts, and interpretations. You do not need to understand the mathematical details or be able to calculate regression models by hand** (although you are expected to be able to work with models and ANOVA tables requiring simple mathematical calculations).**They don’t cover multiple linear regression in exactly the same was we do so focus on using their materials to SUPPORT our lectures. **

**NOTE: The material provided in our lectures and Learn by Doing activities is the most important content to understand. **

**One difference in their materials of importance is that they let p = the number of parameters in the model (including the intercept)** and **we let p = the number of x-variables in our model**. You should convince yourself that these approaches are equivalent and stick with whichever you like best for your own work. We will not ask any questions which will require one method over the other.

**PENN STATE STAT 501 Materials – required textbook reading for this material**

**Note:** If you click on “Printer Friendly Version” in the main lesson page it will show all pages in that lesson. These are provided here after the main lesson link.

## PROC GLM

We will be using PROC GLM for simple and multiple linear regression so let’s look at some PROC GLM documentation including examples.

**SAS Documentation: **As we continue into multiple linear regression the examples we asked you to review should start to make more sense. One is linked here again along with more content about GLM and common statements.

- PROC GLM – CLASS statement options (used to specify CATEGORICAL PREDICTORS in our model and their REFERENCE GROUPS)
- Example 47.4 Analysis of Covariance
- PROC GLM Syntax (Notice how many commands PROC GLM has and investigate commands as you learn them by clicking on their links here).
- PROC GLM – MODEL statement options (review this carefully regarding what option do – investigate common options by looking into the details!!)

## PHC 6053 Videos (43:30)

## Part 1 – Understanding Binary Predictors (6:50)

View Lecture Slides with Transcript

## Part 2 – Model for Binary Predictors (4:27)

View Lecture Slides with Transcript

## Part 3 – First Example for Binary Predictors (11:53)

View Lecture Slides with Transcript

## Part 4 – Additional Example for Binary Predictors (8:03)

View Lecture Slides with Transcript

## Part 5 – Basics for Multi-Level Categorical Variables (12:17)

View Lecture Slides with Transcript

## Examples and Learn by Doing Activities

Here is an activity to help you understand binary predictors.

## EXAMPLE: One Binary Predictor

**Worksheet:**Unit3-02-01-OneBinaryPredictor.pdf**Solutions:**Unit3-02-01-OneBinaryPredictorSolution.pdf

## LEARN BY DOING

**Situation: **

- Y = Response
- X = Binary Predictor (Yes or No)
- Sample mean Y for YES = 50
- Sample mean Y for NO = 35

**Investigate the linear regression equation through these two sample means using several different was of coding. For each coding:**

- Sketch the line through the two means plotted with relation to the current coding (accurately to scale)
- Find the slope and y-intercept of the equation (using logic or algebra)
- Use your equation to predict Y for X = No and X = Yes

**Additionally: **

- What does a 1-unit change in X mean?
- What does the intercept represent?

Now let’s look at some exercises on working with models.

## EXAMPLE: Working With Models – Part 1

In the first activity we will review one continuous predictor and look at one binary predictor.

**Worksheet:**Unit3-02-02-WorkingWithModels-Part1.pdf**Solutions:**Unit3-02-02-WorkingWithModels-Part1-Solution.pdf