Old Regression Materials

These course materials were previously part of PHC 6053 course materials. They have been archived here as a resource on Regression Methods. 

Prerequisites

Although most statistical analyses will be conducted using software, students should be comfortable

  • working with equations
  • performing basic mathematical calculations including
    • order of operations
    • fractions
    • square roots
    • logarithms (base e)
    • exponentials (ex).

Note: in statistics the notation log is equivalent to the natural logarithm ln.  This can be confusing as in algebra a log with no base assumes a base of 10.

  • In this course, a log with no base is assumed to be the natural logarithm, log = ln = loge.
  • We may still occasionally use ln notation for natrual logarithm as well.
  • We will never use any base other than e regardless of the notation used.

Main Course Goal

This course introduces graduate students in fields other than statistics to a wide range of modern regression methods. Emphasis is on modeling driven by actual data from studies in a variety of areas, primarily from health, biology, and ecology.

The primary topics are multiple linear regression, logistic regression, and Poisson regression. A main goal is to learn what approach to use among the linear and nonlinear models, and how to determine if the fit is adequate.

By the end of the course, students will achieve competency in carrying out the analyses in SAS.

Course Objectives

The following objectives will be addressed.

Study Tip: During the course, contemplate the course objectives and consider what you have learned that applies to each.

 

  • CO-1: Select appropriate methods for a scenario; determine if a linear or a nonlinear approach is appropriate
  • CO-2: Use statistical software for performing regression analysis in the SAS language
  • CO-3: Test and interpret linear models for continuous outcome data (normal linear model)
  • CO-4: Test and interpret models for categorical outcome data (logistic and Poisson regression)
  • CO-5: Draw appropriate conclusions for both randomized designed experiments and observational studies
  • CO-6: Communicate clearly to subject matter experts the purposes and results of complex statistical analysis, both orally and in writing.

Software

Topic List

The following broad topics will be covered, those given in bold will be our primary focus for most of the semester.

  • Unit 1: Exploratory Methods and Inference in Case CQ
  • Unit 2: Inference in Case QQ – Simple Linear Regression 
  • Unit 3: Multiple Linear Regression
  • Unit 4: Inference in Case CC and QC – Contingency Tables and Simple Logistic Regression
  • Unit 5: Multiple Logistic Regression
  • Unit 6: Model Selection
  • Unit 7: GLM and Poisson Regression

References and Suggested Textbooks

Penn State has two courses with excellent sets of online materials:

The course materials were originally developed using the following book which is available freely via the UF library. The textbook is not required for reading in this course but is a good reference book for regression methods at the applied level. The mathematical detail is kept to a minimum. The links may only work properly when connected to the UF network either directly or via VPN.

  • Regression Methods in Biostatistics – Linear, Logistic, Survival, and Repeated Measures Models.
    Authors: Eric Vittinghoff, David V. Glidden, Stephen C. Shiboski, Charles E. McCulloch
    ISBN: 978-1-4614-1352-3 (Print) 978-1-4614-1353-0 (Online)
    http://link.springer.com/book/10.1007%2F978-1-4614-1353-0

I appreciate suggestions for good textbook from students and urge you to spend some time in the libraries on campus browsing books that are available on applied regression which resonate with you.

The following are a few other applied regression textbooks which are freely available through the UF library system.