library(tidyverse)
library(tidymodels)
library(knitr)Exam 02 practice
This page contains practice problems to help prepare for Exam 02. This set of practice problems is not comprehensive.
You are encouraged to ask questions during office hours or on Ed Discussion.
Multiple linear regression
Exercise 1
Consider the model on this slide using education and income inequality to understand variability in healthcare expenditure. Note that the log-transformed response was used to fit the model.
Describe the predicted difference in healthcare between a country with income_inequality = 30 and education = Low and a country with income_inequality = 50 and education = High.
Exercise 2
Consider the model on this slide that uses log-transformed healthcare expenditure, income inequality, and education to explain variability in life expectancy.
Describe the predicted difference in life expectancy between a country with healthcare_expend = 2500, income_inequality = 30, education = Low and a country with healthcare_expend = 5000, income_inequality = 20, education = Low.
Exercise 3
Explain how the presence of multicollinearity impacts the model coefficients, predictions from the model, and inferences that can be made from a model.
Describe strategies for dealing with multicollinearity.
Exercise 4
Use the loans_full_schema data frame from the openintro R package. Run the code below to load the data set and fix an underlying issue with the levels.
loans_full_schema <- droplevels(openintro::loans_full_schema)Split the data into training (80%) and testing (20%) sets.
Use the training data to run 10-fold cross validation on the model that predicts
interest_ratefrom debt to income ratio (debt_to_income), the term of loan (term), the number of inquiries into the applicant’s credit during the last 12 months (inquiries_last_12m), and the type of application (application_type). Fit the model such that the effect of debt to income ratio differs by application type.Explain in words how cross validation is conducted.
Explain why we’re focusing on R-squared and RMSE instead of adjusted R-squared, AIC, BIC.
Explain why we don’t use the testing data to evaluate the model during cross validation.
Logistic regression
Consider the following scenario:
As part of a study of the effects of predatory intertidal crab species on snail populations, researchers measured the mean closing forces and the propodus heights of the claws on several crabs of three species.
We will use the following variables:
Force: Closing force of claw (newtons)Height: Propodus height (mm)Species: Crab species - Cp(Cancer productus), Hn (Hemigrapsus nudus), Lb(Lophopanopeus bellus)lb: 1 if Lophopanopeus bellus species, 0 otherwise
You can use the code below to load and prepare the data.
claws <- Sleuth3::ex0722 |>
mutate(lb = factor(if_else(Species == "Lophopanopeus bellus", 1, 0)))
mean_force <- claws |>
summarise(mean(Force)) |> pull()
mean_height <- claws |>
summarise(mean(Height)) |> pull()Exercise 5
Below is the model using Force and Height to predict whether a crab is from the lb species.
lb_model <- glm(lb ~ Force + Height, data = claws,
family = "binomial")
tidy(lb_model) |>
kable(digits = 3)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 4.204 | 2.389 | 1.760 | 0.078 |
| Force | 0.211 | 0.092 | 2.279 | 0.023 |
| Height | -0.895 | 0.398 | -2.249 | 0.025 |
Why do we use a logistic regression model for this analysis?
Exercise 6
Interpret the effect of
Forcein the context of the data.What is the predicted probability a crab Force = 12.134 and Height = 8.813?
Exercise 7
Consider a classification threshold of 0.25.
Construct the confusion matrix.
Compute the following, and explain what they mean in the context of the data.
- Sensitivity
- Specificity
- False positive rate
- False negative rate
- Accuracy
Exercise 8
Use a drop-in-deviance test to evaluate whether the interaction between Force and Height would be useful to add to the model.
State the null and alternative hypotheses in words and mathematical notation.
Explain what the values in the drop-in-deviance test output mean.
State your conclusion in the context of the data.
Exercise 9
What is an advantage of using a drop-in-deviance test instead of AIC (or BIC) to compare models?
What is an advantage of using AIC (or BIC) instead of a drop-in-deviance test to compare models?
Exercise 10
Explain how the slope of the logistic regression model is related to the Adjusted Odds Ratio (or just Odds Ratio if there is one predictor).
Exercise 11
Draw an example of an ROC curve such that the AUC is about 0.55
Draw an example of an ROC curve such that the AUC is about 0.9.
Explain what each point on an ROC curve represents.
Relevant assignments and AEs
The following assignments and AEs cover Exam 02 content. Ask yourself “why” questions as you review your answers, process, and derivations on these assignments. It may also be helpful to explain your process to others.
HW 03, HW 04
Lab 05 - Lab 07
AE 08 - AE 11