This document will help you walk through some of the fundamental R commands you need for exploring linear regression models, fit, and plots. In order to do this lab, it is expected you have completed the week 14 lecture.

Remember to change the `author:` field on this Rmd file to your own name.

We’ll begin by loading some packages.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0

## ── Conflicts ────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Cars93 <- as_tibble(MASS::Cars93)

Part One

Linear regression with Cars93 data

Below is figure showing how Price varies with EngineSize in the Cars93, with accompanying regression lines. There are two plots, one for USA cars, and one for non-USA cars.

qplot(data = Cars93, x = EngineSize, y = Price, colour = Origin) + 
  facet_wrap("Origin") + 
  stat_smooth(method = "lm") + 
  theme(legend.position="none")

## `geom_smooth()` using formula 'y ~ x'

(a) Use the lm() function to regress Price on EngineSize and Origin

# Edit me

(b) Run plot() on your lm object. Do you see any problems?

par(mfrow = c(2,2))
# Edit me

(c) Try running a linear regression with log(Price) as your outcome.

# Edit me

(d) Run plot() on your new lm object. Do you see any problems?

par(mfrow = c(2,2))
# Edit me

Part Two

Learning objectives

We’ll begin by loading some packages.

#library(tidyverse)
library(knitr)

#Cars93 <- as_tibble(MASS::Cars93)
# If you want to experiment with the ggpairs command,
# you'll want to run the following code:
# install.packages("GGally")
# library(GGally)

Linear regression with Cars93 data

(a) Use the lm() function to regress Price on: EngineSize, Origin, MPG.highway, MPG.city and Horsepower.

# Edit me

(b) Use the kable() command to produce a nicely formatted coefficients table. Ensure that values are rounded to an appropriate number of decimal places.

# Edit me

(c) Interpret the coefficient of Originnon-USA. Is it statistically significant?

# Edit me

(d) Interpret the coefficient of MPG.highway. Is it statistically significant?

# Edit me

(d) Use the “2 standard error rule” to construct an approximate 95% confidence interval for the coefficient of MPG.highway. Compare this to the 95% CI obtained by using the confint command.

# Edit me

(e) [Advanced topic, not required] Run the pairs command on the following set of variables: EngineSize, MPG.highway, MPG.city and Horsepower. Display correlations in the Do you observe any collinearities?

panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- abs(cor(x, y))
    txt <- format(c(r, 0.123456789), digits = digits)[1]
    txt <- paste0(prefix, txt)
    if(missing(cex.cor)) cex.cor <- 0.4/strwidth(txt)
    text(0.5, 0.5, txt, cex = pmax(1, cex.cor * r))
}


# Edit me

(f) Use the update command to update your regression model to exclude EngineSize and MPG.city. Display the resulting coefficients table nicely using the kable() command.

# Edit me

(g) Does the coefficient of MPG.highway change much from the original model? Calculate a 95% confidence interval and compare your answer to part (d). Does the CI change much from before? Explain.

# Edit me

(h) Run the plot command on the linear model you constructed in part (f). Do you notice any issues?

# Edit me

Part Three

Learning objectives

#library(tidyverse)

Interaction terms in regression

(a) Run a linear regression to better understand how birthweight varies with the mother’s age and smoking status (do not include interaction terms).

# Edit me

(b) What is the coefficient of mother.age in your regression? How do you interpret this coefficient?

# Edit me

(c) How many coefficients are estimated for the mother’s smoking status variable? How do you interpret these coefficients?

# Edit me

(d) What does the intercept mean in this model?

(e) [Advanced topic, not required.] Using ggplot, construct a scatterplot with birthweight on the y-axis and mother’s age on the x-axis. Color the points by mother’s smoking status, and add smoking status-specific linear regression lines using the stat_smooth layer.

# Edit me

(f) [Advanced topic, not required.] Do the regression lines plotted in part (e) correspond to the model you fit in part (a)? How can you tell?

(g) Fit a linear regression model that now models potential interactions between mother’s age and smoking status in their effect on birthweight.

# Edit me

(h) Interpret your model. Is the interaction term statistically significant? What does it mean?

# Edit me

Lab: Linear regression in R

Your Name Here

Prepared for ITMD/ITMS/STAT 514, Spring 2021

Remember to change the `author:` field on this Rmd file to your own name.

Part One

Linear regression with Cars93 data

Part Two

Learning objectives

Linear regression with Cars93 data

Part Three

Learning objectives

Interaction terms in regression

Lab: Linear regression in R

Your Name Here

Prepared for ITMD/ITMS/STAT 514, Spring 2021

Remember to change the author: field on this Rmd file to your own name.

Part One

Linear regression with Cars93 data

Part Two

Learning objectives

Linear regression with Cars93 data

Part Three

Learning objectives

Interaction terms in regression

Remember to change the `author:` field on this Rmd file to your own name.