# Confounding, Mediation, and Multicollinearity

NOTE: Except in cases of complex calculations, we use brackets [ ] to indicate “functions of” and parentheses ( ) to indicate “multiplication.”

SE[Beta_1-hat] = the standard error of the estimated slope Beta_1-hat. There is no multiplication!

Beta_1-hat(AGE) = the multiplication of estimated slope Beta_1-hat and the variable AGE.

## Introduction

Now we will shift from interpretations of each type of predictor to issues involved with including multiple predictors in our model. In this section we will discuss confounding, mediation, and multi-collinearity.

You might begin by reviewing the Wikipedia pages for these topics with attention to the concepts discussed (but not the mathematical details).

Wikipedia Articles:

## Confounding (15:17)

View Lecture Slides with Transcript

We will review this topic again briefly when we discuss model selection.

## Mediation (18:12; Optional 15:47)

Mediators behave statistically the same as confounders which leads to this important principle:

PRINCIPLE: We must distinuguish between a confounder and a mediator on entirely NON-STATISTICAL grounds!

Review this short presentation prepared for this course.

Presentation: Mediation

• You might reasonably argue that in our example with EXERCISE as a predictor of GLUCOSE that BMI might instead be a mediator.
• I would argue that likely it is more complicated than either a confounder or a mediator as the causal relationship between them likely goes both directions.
• In this course we will not distnguish between mediators and confounders and we will general avoid making any causal interpretations unless our data comes from a controlled experiement and we are discussion our randomized treatment variable.

Watch the following video discussing the concept of mediators and mediation. It also provides an introduction to “moderation” which is another name for “interactions” as is “effect modification” which will be the topic of our next section.

If this topic is of interest to you in your work, consider watching the second part to the previous video:

(OPTIONAL) Video (15:47): Path Analysis Method for Mediation

## Multicollinearity

Watch the following playlist of 2 videos discussing the concept of multicollinearity.

Video (11:30): Multicollinearity

Consider the following materials from Penn State STAT 501 as your primary textbook material for the topic of multicollinearity.

Review Penn State materials with a focus on the definitions, concepts, and interpretations. You do not need to understand the mathematical details or be able to calculate regression models by hand (although you are expected to be able to work with models and ANOVA tables requiring simple mathematical calculations).They don’t cover multiple linear regression in exactly the same was we do so focus on using their materials to SUPPORT our lectures.

One difference in their materials of importance is that they let p = the number of parameters in the model (including the intercept) and we let p = the number of x-variables in our model. You should convince yourself that these approaches are equivalent and stick with whichever you like best for your own work. We will not ask any questions which will require one method over the other.

PENN STATE STAT 501 Materials – required textbook reading for this material

Note: If you click on “Printer Friendly Version” in the main lesson page it will show all pages in that lesson. These are provided here after the main lesson link.

We will review this topic again briefly when we discuss model selection.