AE 07: Exam 01 review

Trail users

Published

February 12, 2026

Important

Go to the course GitHub organization and locate your ae-07 repo to get started.

One person from each group: Put your group’s response on your slide: https://docs.google.com/presentation/d/15nF9ADDlQwiDiRG55TCMOkqQgCy6_JB5hjPKE_kZE90/edit?usp=sharing

Packages

library(tidyverse)
library(tidymodels)
library(knitr)
library(patchwork)

rail_trail <- read_csv("data/rail-trail.csv")

Trail users

The Pioneer Valley Planning Commission (PVPC) collected data for ninety days from April 5, 2005 to November 15, 2005. Data collectors set up a laser sensor, with breaks in the laser beam recording when a rail-trail user passed the data collection station.

We will use regression analysis to predict the number of trail users based on weather and other features describing the day.

We’ll use the following variables in this analysis:

  • volume estimated number of trail users that day (number of breaks recorded)

  • hightemp daily high temperature (in degrees Fahrenheit)

  • daytype one of “weekday” or “weekend”

View the data set1 to see the remaining variables.

Bivariate EDA

p1 <- ggplot(data = rail_trail, aes(x = hightemp, y = volume)) + 
  geom_point() + 
  labs(x = "High temperature (in Fahrenheit)", 
       y = "Number of users", 
       title = "Volume versus high temperature")

p2 <- ggplot(data = rail_trail, aes(x = daytype, y = volume)) + 
  geom_boxplot() + 
  labs(x = "", 
       y = "Number of users", 
       title = "Volume versus day type")

p1 + p2

Models

Below is the code for the two models used in this analysis.

# main effects model 
main_effects_model <- lm(volume ~ hightemp + daytype, data = rail_trail)


# interaction model 
interaction_model <- lm(volume ~ hightemp + daytype + hightemp * daytype,
                         data = rail_trail)

Exercises

Exercise 1

Consider the main effects model.

  • Display the model along with the 90% confidence intervals for the coefficients.
  • Interpret the coefficient of hightemp in the context of the data.

  • Interpret the coefficient of daytypeWeekend in the context of the data.

  • Does the intercept have a meaningful interpretation? Explain your reasoning.

  • If not, what can we do to make the interpretation meaningful?

Exercise 2

Consider the interactions effects model.

  • Display the model along with the 90% confidence intervals for the coefficients.
  • Interpret the main effect of hightemp.

  • Interpret the coefficient hightemp:daytypeWeekend .

  • According to this model, does the effect of hightemp differ for weekends vs. weekdays? Explain.

Exercise 3

The following code can be used to create a bootstrap distribution for the model coefficients in the main effects model.

  • Describe what each line of code does.
set.seed(210)

1boot_dist <- rail_trail |>
2  specify(volume ~ hightemp + daytype) |>
3  generate(reps = 100, type = "bootstrap") |>
4  fit()
1
___
2
___
3
___
4
___
  • How many observations are in each bootstrap sample?

  • Explain why we use bootstrap sampling, instead of permutation sampling, to construct confidence intervals.

Exercise 4

Consider the bootstrap distribution for the coefficient of hightemp in the main effects model.

  • How many estimated coefficients of hightemp make up the bootstrap distribution?

  • Run the code from the previous exercise to construct the bootstrap distribution. Use the bootstrap distribution to compute \(SE_{\hat{\beta}_\text{hightemp}}\). Explain what this value means in the context of the data.

  • Construct a 90% bootstrap confidence interval for the coefficient of hightemp. Interpret the interval in the context of the data.

Exercise 5

We would like to conduct the following hypothesis test for the main effects model.

\[ \begin{aligned} &H_0: \beta_\text{hightemp} = 0 \\ &H_a: \beta_\text{hightemp} \neq 0 \end{aligned} \]

  • State the hypotheses in words.

  • Conduct a permutation test by permuting the response variable. Use 100 iterations, a seed of 210, and a significance level of \(\alpha = 0.05\).

Exercise 6

  • Use the interaction effects model to predict the volume for a Saturday that has a high temperature of 75 degrees Fahrenheit.

  • Suppose you’re asked to construct a confidence interval and a prediction interval given a Saturday with a high temperature of 75 degrees Fahrenheit. Which interval would you expect to be wider and why? In your answer clearly state the difference between these intervals.

  • Construct the 90% intervals and interpret them in the context of the data.

Wrapping up

Important

Once you’ve completed the AE:

  • Render the document to produce the PDF with all of your work from today’s class.

  • Push all your work to your AE repo on GitHub. You’re done! 🎉

Footnotes

  1. Source: Pioneer Valley Planning Commission via the mosaicData package.↩︎