Multiple linear regression
Feb 09, 2026
Today’s data come from Equity in Athletics Data Analysis and includes information about sports expenditures and revenues for colleges and universities in the United States. This data set was featured in a March 2022 Tidy Tuesday.
We will focus on the 2019 - 2020 season expenditures on football for 127 institutions in the NCAA - Division 1 FBS. The variables are :
total_exp_m: Total expenditures on football in the 2019 - 2020 academic year (in millions USD)
enrollment_th: Total student enrollment in the 2019 - 2020 academic year (in thousands)
type: institution type (Public or Private)
p1 <- ggplot(data = football, aes(x = enrollment_th, y = total_exp_m)) +
geom_point() +
labs(x = "Enrollment (in thousands)",
y = "Total expenditures (in millions)",
title = "Football expenditures vs. enrollment")
p2 <- ggplot(data = football, aes(x = type, y = total_exp_m)) +
geom_boxplot() +
labs(x = "Institution type",
y = "Total expenditures (in millions)",
title = "Football expenditures vs. institution type")
p1 + p2ggplot(data = football, aes(x = enrollment_th, y = total_exp_m, color = type,
linetype = type)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Enrollment (in thousands)",
y = "Total expenditures (in millions)",
title = "Football expenditures vs. enrollment",
subtitle = "by institution type",
color = "Type",
linetype = "Type") +
theme_bw()| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 28.487 | 4.856 | 5.866 | 0.00 |
| enrollment_th | -0.056 | 0.370 | -0.151 | 0.88 |
| typePublic | -24.038 | 5.523 | -4.353 | 0.00 |
| enrollment_th:typePublic | 0.915 | 0.387 | 2.364 | 0.02 |
Let’s look at the test for enrollment_th:typePublic.
State the null and alternative hypotheses being tested in the model output.
Explain what std.error = 0.387 quantifies.
Explain how the test statistic 2.364 is computed and what it means in the context of the data.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 28.487 | 4.856 | 5.866 | 0.00 |
| enrollment_th | -0.056 | 0.370 | -0.151 | 0.88 |
| typePublic | -24.038 | 5.523 | -4.353 | 0.00 |
| enrollment_th:typePublic | 0.915 | 0.387 | 2.364 | 0.02 |
State the distribution used to compute the p-value.
The coefficient of enrollment_th:typePublic has a low p-value but the coefficient for enrollment_th has a high p-value. What does this mean about the relationship between football expenditures and enrollment?
Click here to find your team for this week’s lab.
Sit with your team.
Only one team member should type at a time. There are markers in today’s lab to help you determine whose turn it is to type.
Don’t forget to pull to get your teammates’ updates before making changes to the .qmd file.
Important
Only one submission per team on Gradescope. Read the submission instructions carefully!
05:00
Do not pressure each other to finish early; use the time wisely to really learn the material and produce a quality report.
The labs are structured to help you learn the steps of a data analysis. Do not split up the lab among the team members; work on it together in its entirety.
Everyone has something to contribute! Use the lab groups as an opportunity to share ideas and learn from each other.
In today’s lab, your team will compare multiple models to find the best one to describe what makes a good candy.