library(tidyverse)
library(janitor)
library(tidymodels)
Instructions
For this homework, you should create a new folder in your homework
directory. Call it HW5
or something similar that you can
keep track of. Download the homework markdown template file
Student_HW_template.Rmd
from the course webpage, and put a
copy in this folder. Rename it something like
HW5_YourName.Rmd
. This markdown document will be where you
will answer each of the questions below.
The Assignment
We’re going to deal again with the titanic data from your in class activity.
<- read_csv("titanic.csv") %>%
titanic clean_names() %>%
mutate(survived = as.factor(survived))
EDA
You did this already in your activity, but I’m leaving this section here because it’s an important step in the process: Always vizualize before modeling!
Modeling
Perform a train/test split and create a cross-validation split. Build a recipe to preprocess this data to use in a logistic model and KNN model.
Specify a Logistic Regression model for classification, using the
glmnet
engine. Set thepenalty
andmixture
arguments totune()
.Specify a KNN model for classification, using the
kknn
engine. Set theneighbors
anddist_power
arguments totune()
.Create a workflow set containing your recipe(s) and two model specifications. You will need to look up in the TMWR book how to do that (Section 11.1, and Chapter 15 will be the most help).
Tuning
Using the
workflow_map
function, run thetune_grid
function on your workflow set and choose the best parameters for each model.Use
collect_metrics
to look at the accuracy estimates for each model. Which family of model (KNN or LogReg) is performing better, if either?Finalize your model by choosing the best performing one. Fit the model to the training set and report your test set accuracy.
Submitting HW
When you’ve successfully answered all the questions, knit your
document to a PDF file. Look through it to make sure everything worked
the way you expect it to. You will submit both your .Rmd
and .pdf
files to Schoology.