library(tidyverse)
library(tidymodels)
library(janitor)
tidymodels_prefer() # Resolves conflicts, prefers tidymodel functions
We’re going to load again the breast cancer classification data set:
<- read.csv("breast-cancer.csv") %>%
patients clean_names() %>%
mutate(class = factor(class))
Exercise 1. Split the data into training and test
sets. Create a model specification of a linear support vector classifier
using svm_linear
. You will need to install and load the
LiblineaR
package.
Exercise 2. Create a recipe to that predicts
class
by bland_chromatin
and
single_epithelial_cell_size
. Make sure to normalize the
data. Name your recipe svm_rec
.
Exercise 3. Add your recipe and model to a workflow.
Fit the workflow and name it svm_fit
.
Exercise 4. Plot the test set for two variables
bland_chromatin
and
single_epithelial_cell_size
. Color by predicted class. Use
the geom_abline
to plot the decision boundary (the slope
and intercept for the decision boundary can be computed using the code
below, as long as you named everything the same way I did)
Exercise 5. Based on the plot, how did your model do? What drawbacks are there to this linear SVM that you see?
# Calculates the slope and intercept of the linear decision boundary
<- tidy(prep(svm_rec), 1)$value[1:2]
means <- tidy(prep(svm_rec), 1)$value[3:4]
sds <- tidy(svm_fit)$estimate
coeff
<- -coeff[1] * sds[2] / ( coeff[2] * sds[1] )
slope <- -coeff[3]*sds[2]/coeff[2] + means[2] + coeff[1]*sds[2]*means[1]/(coeff[2]*sds[1]) intercept
Side Note: If you’re wondering what’s going on in
these expressions, remember that we normalized our data, so we created
new variables \[ x_{new} = \frac{x -
\bar{x}}{s_x} \qquad \mbox{and} \qquad y_{new} = \frac{y - \bar{y}}{s_y}
\] Subsequently, we found a separating plane in the new variables
whose coefficients are stored in the fitted model: \[ a x_{new} + b y_{new} + B = 0 \] Run
tidy(svm_fit)
to see these (\(B\) is the Bias
)! If you plug
in the expressions for \(x_{new}\) and
\(y_{new}\) to the plane, and solve for
\(y\) in terms of \(x\), you get a linear expression with the
slope and intercept given above.