library(tidyverse)
library(tidymodels)
library(openintro)
library(knitr)AE 05: Permutation test for the slope
Houses in Duke Forest
Go to the course GitHub organization and locate your ae-05 repo to get started.
Data
The data are on houses that were sold in the Duke Forest neighborhood of Durham, NC around November 2020. It was originally scraped from Zillow, and can be found in the duke_forest data set in the openintro R package.
Goal: Use statistical inference to evaluate whether there is a relationship between the age of the house at time of sale and its price.
Exploratory data analysis
Let’s begin by creating a new variable that is the age of the house in 2020.
duke_forest <- duke_forest |>
mutate(age_2020 = 2020 - year_built)Now let’s visualize the relationship between the age of the house in 2020 and the sales price.
ggplot(duke_forest, aes(x = age_2020, y = price)) +
geom_point(alpha = 0.7) +
labs(
x = "Age in 2020 (years)",
y = "Sale price (USD)",
title = "Price and age of houses in Duke Forest"
) +
scale_y_continuous(labels = label_dollar()) 
Model
df_fit <- lm(price ~ age_2020, data = duke_forest)
tidy(df_fit) |>
kable(digits = 3)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 690891.015 | 68637.793 | 10.066 | 0.000 |
| age_2020 | -2473.935 | 1225.191 | -2.019 | 0.046 |
Hypothesis test
For code chunks with fill-in-the-blank code, change code chunk option to #| eval: true once you’ve filled in the code.
State the null and alternative hypotheses
Write the null and alternative hypotheses in words and mathematical notation.
Generate null distribution using permutation
Fill in the code, then set eval: true .
n = 100
set.seed(012226)
null_dist <- _____ |>
specify(______) |>
hypothesize(null = "independence") |>
generate(reps = _____, type = "permute") |>
fit()Visualize distribution
# Code for histogram of null distributionCompute the p-value
# get observed fit
observed_fit <- duke_forest |>
specify(price ~ age_2020) |>
fit()
# calculate p-value
get_p_value(
____,
obs_stat = ____,
direction = "two-sided"
)State conclusion
Write your conclusion in the context of the data. You can use 0.05 as the decision-making threshold.
Bootstrap CI
Construct the bootstrap CI
Construct a 95% bootstrap confidence interval.
# Code for ciDraw conclusion
Interpret the interval in the context of the data.
Is the interval consistent with the conclusion from your hypothesis test? Briefly explain why or why not.
Wrapping up
Once you’ve completed the AE:
Render the document to produce the PDF with all of your work from today’s class.
Push all your work to your AE repo on GitHub. You’re done! 🎉