This page contains links to useful resources for learning R.

- CRAN Home Page: Download R for Windows or MAC here
- R-Studio: An interface for using R that many students seem to like.
- An Introduction to R: Most Recent Version
- Short R Reference Card: Compact and useful information about common R commands
- UCLA Institute for Digital Research and Education: Excellent resources for a variety of packages, including R.
- R Project Home Page: A broader site associated with R. Contains links to CRAN materials plus more.
- Datasets in R: A list of datasets available in R or through specific common R packages.
- Working With Functions: A short tutorial on writing functions.
- Using R for Introductory Statistics: A manual on using R for basic statistics.
- Graphics in R: Examples of basic and advanced graphics using R.
- R-Graph-Example: An example showing the pdf, cdf, and survial function for one discrete and one continuous random variable.
- Quick R: A site containing some useful information about learning R for data analysis;

Associated with the book “R in Action.” - R by Example: A set of examples on specific topics to illustrate common uses of R.
- A few illustrative examples in R: A short introduction to R.
- WESSA.net: This site offers R-Modules online. Viewing the underlying R-code can be useful.
- Carnegie Mellon Open Learning Statistics: This course has activities which give directions using R along with other software.
- http://notepad-plus-plus.org/download/: A programming editor which gives some SAS-style color coding to R scripts.
- https://www.programiz.com/r-programming – A tutorial site for beginning programming

- On the site home page under SPSS Resources you will find many useful links. In particular, you should learn to use the UCLA Institute for Digital Research and Education site if you plan to use SPSS in the future. If you have suggested sites, please let us know!

You will be directed to the needed tutorials for each assignment from inside E-Learning. The files needed for or produced from these tutorials are provided below the tutorials as well as where appropriate in each tutorial.

- A – (4:00) Introduction To SPSS
- B – (2:43) Dataset Information
- C – (2:12) Tips for Watching

- A – (2:23) Importing Data from an EXCEL file (XLS or XLSX)
- B – (3:44) Importing Data from a Comma Separated (CSV) file
- C – (1:43) Opening an SPSS Dataset (SAV)
- D – (0:58) Importing Data from a SAS Dataset (SAS7BDAT)

- A – (6:00) Basic Data Settings
- B – (3:25) Labeling Variables in an SPSS Dataset
- C – (3:19) Translating Coded Categorical Variables
- D – (4:30) Computing New Variables
- E – (8:46) Categorize a Quantitative Variable

- A – (5:40) Opening and Editing SPV Files
- B – (6:56) Editing Graphs in SPSS Output
- C – (3:31) Editing Tables and Page Breaks
- D – (5:23) Exporting and Copying Output

- A – (7:00) Frequency Distributions
- B – (3:57) Creating Bar Charts and Pie Charts

- A – (8:00) Numeric Measures using EXPLORE
- B – (2:29) Creating Histograms and Boxplots
- C – (2:31) Creating QQ-Plots and PP-Plots
- D – (6:41) Numeric Measures: Other Methods

- A – (7:57) Two-Way (Contingency) Tables – EDA
- B – (9:19) Two-Way (Contingency) Tables – Inference

- A – (3:29) Numeric Summaries by Groups
- B – (1:59) Side-By-Side Boxplots
- C – (5:30) Two Sample T-Test
- D – (4:22) One Way ANOVA
- E – (3:57) Non Parametric Tests

- A – (3:19) Data Preparation
- B – (2:00) EDA of Differences
- C – (3:11) Paired T-Test
- D – (3:32) Non Parametric (Paired)

- A – (2:38) Basic Scatterplots
- B – (2:54) Grouped Scatterplots
- C – (3:35) Pearson’s Correlation Coefficient
- D – (2:53) Simple Linear Regression – EDA
- E – (7:07) Simple Linear Regression (Inference)

**Don’t forget your semicolons! And definitely don’t forget to look at your LOG file!!**

General Information

**Online SAS Resources**from our home page. In particular, you should learn to use the UCLA Institute for Digital Research and Education site and the SAS documentation for your version of SAS.

**“The Little SAS Book”**is an excellent resource for beginning SAS programmers.- Always look for the latest edition (currently the fifth edition).
- There is a version for standard SAS as well as one for SAS Enterprise guide (a “point and click” version of SAS).
- This book is available via books 24/7 through the UF library system. You must be logged in through the network (on campus computer or vpn connection, etc.) to access the text.

- Paul D. Allison has numerous books on specific types of analyses in SAS.

PHC 6052: Introduction to Biostatistical Methods

**PHC 6052: SAS Tutorials**(moved from this main page which was growing too large)Here we provide links to our tutorials and lectures on using SAS statistical software for PHC 6052.

- SAS Skills Document for Material Covered in an older version of the PHC 6052 course

A good resource originally written for SAS version 9.2 and a previous version of PHC 6052.

**Videos Lectures on More Advanced Topics**- UCLA IDRE Seminars: https://stats.idre.ucla.edu/other/mult-pkg/seminars/#SAS
- Penn States online materials for their three 1-credit SAS courses
- The Little SAS Book: A Primer 5th ed., by Lora Delwiche and Susan Slaughter. You can read for free via the UF library.
- SAS 9.4 Documentation: Procedures by Name

- SAS is used in PHC 6052, PHC 6053, PHC 6080, and PHC 6081.
- SPSS is used in PHC 6050 and PHC 6053.
- R is used in PHC 6055.

Click on the appropriate link to the left to browse resources for a particular software package.

]]>Optional: Create your own solutions using your software for extra practice.

- Find a regression line and plot it on the scatterplot
- Examine the effect of outliers on the regression line

Use the following output to answer the questions that follow.

The modern Olympic Games have changed dramatically since their inception in 1896. For example, many commentators have remarked on the change in the quality of athletic performances from year to year. Regression will allow us to investigate the change in winning times for one event — the 1,500 meter race.

Here is a summary of the variables in our dataset:

**Year:**the year of the Olympic Games, from 1896 to 2000.**Time:**the winning time for the 1,500 meter race, in seconds.

Answer the following questions using the output. In this exercise you will:

- use the regression line to make predictions
- evaluate how reliable these predictions are

Use the linear regression on the full data to answer the following question.

Use the linear regression after removing the outlier to answer the next two questions.

**Import Data:**FILE > OPEN > DATA, choose Excel file from the pull-down, find the file, continue**Edit Data:**DATA > DEFINE VARIABLE PROPERTIES**Scatterplot:**GRAPHS > CHART BUILDER, create a simple scatterplot relating X = Year to Y = Time, double click on created scatterplot to add trend-line**Regression Equation:**ANALYZE > REGRESSION > LINEAR**Remove Outlier and Save New Data:**select the row containing the outlier, right-click on the row number and choose CUT**Scatterplot:**GRAPHS > CHART BUILDER, create a simple scatterplot relating X = Year to Y = Time using the new dataset, double click on created scatterplot to add trend-line**Regression Equation:**ANALYZE > REGRESSION > LINEAR

**View Dataset Information in SAS:**Use PROC CONTENTS to view the information about the dataset.**Create Regression Analysis with Fit Plot:**Use PROC REG to obtain the simple linear regression analysis for Y = time using X = year as the predictor. In SAS 9.3 (if you have ODS GRAPHICS enabled) you should obtain the fit plot by default in your HTML output). In SAS 9.2 you must use ODS GRAPHCIS ON to obtain these results.Note: In SAS 9.2, I tend to use ODS GRAPHICS OFF immediately following the procedure. This is not neccessary, however, you will receive ODS GRAPHICS until you turn it off with this command or exit SAS 9.2. In SAS 9.3, ODS GRAPHICS are enabled by default but can be enabled/disabled under TOOLS > OPTIONS > PREFERENCES in the RESULTS tab.**Delete Outlier:**Using a DATA step create a new dataset (olympics2) and use an IF-THEN statement to delete the observation corresponding to the outlier. This outlier is for the first observation in year=1896.**Create Regression Analysis with Fit Plot:**Use PROC REG to obtain the simple linear regression analysis for Y = time using X = year as the predictor using your dataset with the outlier removed. In SAS 9.3 (if you have ODS GRAPHICS enabled) you should obtain the fit plot by default in your HTML output). In SAS 9.2 you must use ODS GRAPHCIS ON to obtain these results.

This document is linked from Linear Relationships – Linear Regression.

]]>Optional: Create your own solutions using your software for extra practice.

- Observe how an outlier can affect the correlation coefficient by comparing the value using data with and without an outlier.

Use the following output to answer the questions that follow.

The average gestation period, or time of pregnancy, of an animal is closely related to its longevity — the length of its lifespan. Data on the average gestation period and longevity (in captivity) of 40 different species of animals have been recorded.

Here is a summary of the variables in our dataset:

**animal:**the name of the animal species.**gestation:**the average gestation period of the species, in days.**longevity:**the average longevity of the species, in years.

Remember that the correlation is only an appropriate measure of the **linear **relationship between two quantitative variables. First produce a scatterplot to verify that gestation and longevity are nearly linear in their relationship.

Answer the following questions using the output obtained. In this exercise we will:

- use the scatterplot to examine the relationship between two quantitative variables.
- use the labeled scatterplot to better understand the form of a relationship.

(Optional) SPSS Steps:

**Label Variables amd Define Variable Properties****Create Scatterplot:**GRAPHS > CHART BUILDER, create a simple scatterplot relating X = longevity to Y = gestation**Calculate Correlation:**ANALYZE > CORRELATE > BIVARIATE, calculate the correlation between longevity and gestation as illustrated**Remove Outlier and Save New Data:**select the row containing the outlier, right-click on the row number and choose CUT**Re-create Scatterplot:**GRAPHS > CHART BUILDER, create a simple scatterplot relating X = longevity to Y = gestation using the new dataset**Re-calculate Correlation:**ANALYZE > CORRELATE > BIVARIATE, calculate the correlation on the new dataset

**Label Variables:**Using a DATA step create a new dataset (animals2) where you label the varibles longevity and gestation as Longevity (years) and Gestation (days) using a LABEL statement.**View Dataset Information in SAS:**Use PROC CONTENTS to view the information about the new dataset.**Create Basic Scatterplot:**Use PROC SGPLOT and the SCATTER statement to create a scatterplot of X=longevity by Y=gestation.**Calculate Correlation Coefficient:**Use PROC CORR to calculate the correlation coefficient between X=longevity by Y=gestation. In SAS 9.3 you will likely get the scatterplot matrix automatically, in SAS 9.2 you must request this by using ODS GRAPHICS ON before the procedure and ODS GRAPHIC OFF to stop producing this output after the procedure (or whenever you wish to stop producing ODS GRAPHICS).**Delete Outlier:**Using a DATA step create a new dataset (animals3) and use an IF-THEN statement to delete the observation corresponding to the outlier. This outlier is an elephant with average longevity of 40 years and average gestation of 645 days.**View Dataset Information in SAS:**Use PROC CONTENTS to view the information about the new dataset where you have removed the outlier.**Create Basic Scatterplot:**Use PROC SGPLOT and the SCATTER statement to create a scatterplot of X=longevity by Y=gestation on the dataset with the outlier removed.**Calculate Correlation Coefficient:**Use PROC CORR to calculate the correlation coefficient bewteen X=longevity by Y=gestation on the dataset with the outlier removed.

This document is linked from Linear Relationships – Correlation.

]]>Optional: Create your own solutions using your software for extra practice.

- Create and interpret a simple scatterplot
- Create and interpret a labeled scatterplot

Use the following output to answer the questions that follow.

In this activity we will look at height and weight data that were collected from 57 males and 24 females, and use the data to explore how the weight of a person is related to (or affected by) his or her height. This implies that height will be our explanatory variable and weight will be our response variable. We will then look at gender, and see how labeling this third variable contributes to our understanding of the form of the relationship.

Our dataset contains the following variables:

**gender:**0 = male, 1 = female.**height:**in inches.**weight:**in pounds.

Answer the following questions using the output obtained. In this exercise we will:

- use the scatterplot to examine the relationship between two quantitative variables.
- use the labeled scatterplot to better understand the form of a relationship.

(Optional) SPSS Steps:

**Import Data:**FILE > OPEN > DATA, choose Excel file from the pull-down, find the file, continue**Define Variable Properties:**Provide labels for 0 and 1 as Male and Female, and label- Height: label as “Height (inches)
- Weight: label as “Weight (pounds)

**Scatterplots:**GRAPHS > CHART BUILDER, complete the wizard for each of the two requested graphs. Edit colors for males and females on the labeled scatterplot.

**Create FORMATS for Gender:**Use PROC FORMAT to create a format to translate 1 into “Female” and 0 into “Male” (we will associate it with the variable in the next step).**Label and Format Variables:**Using a DATA step, create a new dataset named height2 where you label the variables height and weight as Height (inches) and Weight (pounds) using a LABEL statement. Use a format statement to format the variable gender with the format created in the previous step.**View Dataset Information in SAS:**Use PROC CONTENTS to view the information about the new dataset.**Create Basic Scatterplot:**Use PROC SGPLOT and the SCATTER statement to create a scatterplot of X=height by Y=weight**Create Labeled Scatterplot:**Use PROC SGPLOT and the SCATTER statement to create a scatterplot of X=height by Y=weight with the GROUP= option to label the points by gender.

This document is linked from Scatterplots.

Optional: Create your own solutions using your software for extra practice.

In this activity, we will use the collected data to:

- build a two-way table and compute conditional percentages.
- interpret the data in terms of the relationship between a young child’s nighttime exposure to light and later nearsightedness.

An Associated Press article captured the attention of readers with the headline “Night lights bad for kids?” The article was based on a 1999 study at the University of Pennsylvania and Children’s Hospital of Philadelphia, in which parents were surveyed about the lighting conditions under which their children slept between birth and age 2 (lamp, night-light, or no light) and whether or not their children developed nearsightedness (myopia). The purpose of the study was to explore the effect of a young child’s nighttime exposure to light on later nearsightedness.

nightlight.xls or nightlight.csv

**Create Two-Way tables:**ANALYZE > DESCRIPTIVE STATISTICS > CROSSTABS, complete the wizard (4 times) to obtain- Two-way table with the count (frequency) and percent (out of total)
- Two-way table with the count (frequency) and row percents
- Two-way table with the count (frequency) and column percents
- Two-way table with the count (frequency), row and column percents

**Create Two-Way tables:**Use PROC FREQ and the tables statement to create a two-way table with light and nearsightedness

This document is linked from Case C-C.

This document is linked from Types of Variables.

]]>For this first activity with data you will need EXCEL (or Open Office) to view the .xls file. You can view the .csv file in EXCEL or any text editor.

Very soon you will need SAS or SPSS (depending upon which course you are taking) and you will learn to import data from .xls and/or .csv files.

It is not necessary to have EXCEL to import .xls files, however, if you wish to view the original file you will need EXCEL or another program which can open or view these files.

In this course, when you are given data in .xls/.csv format, it is extremely important to check the raw dataset against your imported data as different versions of programs and different computer settings can cause issues with the data import process.

Clinical depression is the most common mental illness in the United States, affecting 19 million adults each year (Source: NIMH, 1999). Nearly 50% of individuals who experience a major episode will have a recurrence within 2-3 years. Researchers are interested in comparing therapeutic solutions that could delay or reduce the incidence of recurrence.

In a study conducted by the National Institutes of Health, 109 clinically depressed patients were separated into three groups, and each group was given one of two active drugs (imipramine or lithium) or no drug at all. For each patient, the dataset contains the treatment used, the outcome of the treatment, and several other interesting characteristics.

Here is a summary of the variables in our dataset:

**Hospt:**The patient’s hospital, represented by a code for each of the 5 hospitals (1, 2, 3, 5, or 6)

**Treat:**The treatment received by the patient (Lithium, Imipramine, or Placebo)

**Outcome:**Whether or not a recurrence occurred during the patient’s treatment (Recurrence or No Recurrence)

**Time:**Either the time (days) till recurrence, or if no recurrence, the length (days) of the patient’s participation in the study.

**AcuteT:**The time (days) that the patient was depressed prior to the study.

**Age:**The age of the patient in years, when the patient entered the study.

**Gender:**The patient’s gender (1 = Female, 2 = Male)

**Note:** In this dataset some of the categorical variables use numeric codes and others use a text description. Often, if numeric codes are to be used, then ALL of the categorical variables will be coded. When we work in software, we will learn how to have the program translate these codes for us in our analyses.

To open the data, right-click on the file name, depression.xls, and choose “Save Link As” (or “Save Target As”) to download the file to your computer. Then find the downloaded file and double-click it to open it in Excel (or Open Office, etc.).

This dataset is also available as a comma separated file (CSV), depression.csv which can be opened in any text editor, although the data are not as visually organized in this type of file.

In future assignments you will need to download datasets in this manner in order to import them, i.e. you will need to have the file saved to your computer.

In Excel, the dataset is in tabular form. Each row contains the values of the variables associated with a single individual, and the different variables are separated into columns. It is helpful if the columns are labeled with the variable names, as we have in this case.

Which variables are categorical and which are quantitative?

This document is linked from Types of Variables.

]]>