Learn By Doing – Linear Regression (Software)

Published: December 25th, 2012

Category: Activity 1: Learn By Doing, Learning Statistical Software

Use the solutions provided and complete the questions for practice with Case Q-Q.

Optional: Create your own solutions using your software for extra practice.

Objectives:

  • Find a regression line and plot it on the scatterplot
  • Examine the effect of outliers on the regression line

Solutions:

Use the following output to answer the questions that follow.

Background Information for Dataset

The modern Olympic Games have changed dramatically since their inception in 1896. For example, many commentators have remarked on the change in the quality of athletic performances from year to year. Regression will allow us to investigate the change in winning times for one event — the 1,500 meter race.

Here is a summary of the variables in our dataset:

  • Year: the year of the Olympic Games, from 1896 to 2000.
  • Time: the winning time for the 1,500 meter race, in seconds.

Data

Learn By Doing

Answer the following questions using the output. In this exercise you will:

  • use the regression line to make predictions
  • evaluate how reliable these predictions are

Use the linear regression on the full data to answer the following question.

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-LBD02019.swf

 

Use the linear regression after removing the outlier to answer the next two questions.

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-LBD02020.swf

http://phhp-faculty-cantrell.sites.medinfo.ufl.edu/files/2012/07/qz-LBD02021.swf


(Optional) SPSS Steps:

  • Import Data: FILE > OPEN > DATA, choose Excel file from the pull-down, find the file, continue
  • Edit Data: DATA > DEFINE VARIABLE PROPERTIES
  • Scatterplot: GRAPHS > CHART BUILDER, create a simple scatterplot relating X = Year to Y = Time, double click on created scatterplot to add trend-line
  • Regression Equation: ANALYZE > REGRESSION > LINEAR
  • Remove Outlier and Save New Data: select the row containing the outlier, right-click on the row number and choose CUT
  • Scatterplot: GRAPHS > CHART BUILDER, create a simple scatterplot relating X = Year to Y = Time using the new dataset, double click on created scatterplot to add trend-line
  • Regression Equation: ANALYZE > REGRESSION > LINEAR

(Optional) SAS Steps:

  • View Dataset Information in SAS: Use PROC CONTENTS to view the information about the dataset.
  • Create Regression Analysis with Fit Plot: Use PROC REG to obtain the simple linear regression analysis for Y = time using X = year as the predictor. In SAS 9.3 (if you have ODS GRAPHICS enabled) you should obtain the fit plot by default in your HTML output). In SAS 9.2 you must use ODS GRAPHCIS ON to obtain these results.Note: In SAS 9.2, I tend to use ODS GRAPHICS OFF immediately following the procedure. This is not neccessary, however, you will receive ODS GRAPHICS until you turn it off with this command or exit SAS 9.2. In SAS 9.3, ODS GRAPHICS are enabled by default but can be enabled/disabled under TOOLS > OPTIONS > PREFERENCES in the RESULTS tab.
  • Delete Outlier: Using a DATA step create a new dataset (olympics2) and use an IF-THEN statement to delete the observation corresponding to the outlier. This outlier is for the first observation in year=1896.
  • Create Regression Analysis with Fit Plot: Use PROC REG to obtain the simple linear regression analysis for Y = time using X = year as the predictor using your dataset with the outlier removed. In SAS 9.3 (if you have ODS GRAPHICS enabled) you should obtain the fit plot by default in your HTML output). In SAS 9.2 you must use ODS GRAPHCIS ON to obtain these results.

This document is linked from Linear Relationships – Linear Regression.