Linear Support Vector Classification

library(tidyverse)
library(tidymodels)
library(janitor)
tidymodels_prefer() # Resolves conflicts, prefers tidymodel functions

We’re going to load again the breast cancer classification data set:

patients <- read.csv("breast-cancer.csv") %>% 
  clean_names() %>% 
  mutate(class = factor(class))

Exercise 1. Split the data into training and test sets. Create a model specification of a linear support vector classifier using svm_linear. You will need to install and load the LiblineaR package.

Exercise 2. Create a recipe to that predicts class by bland_chromatin and single_epithelial_cell_size. Make sure to normalize the data. Name your recipe svm_rec.

Exercise 3. Add your recipe and model to a workflow. Fit the workflow and name it svm_fit.

Exercise 4. Plot the test set for two variables bland_chromatin and single_epithelial_cell_size. Color by predicted class. Use the geom_abline to plot the decision boundary (the slope and intercept for the decision boundary can be computed using the code below, as long as you named everything the same way I did)

Exercise 5. Based on the plot, how did your model do? What drawbacks are there to this linear SVM that you see?

# Calculates the slope and intercept of the linear decision boundary
means <- tidy(prep(svm_rec), 1)$value[1:2]
sds <- tidy(prep(svm_rec), 1)$value[3:4]
coeff <- tidy(svm_fit)$estimate

slope <- -coeff[1] * sds[2] / ( coeff[2] * sds[1] )
intercept <- -coeff[3]*sds[2]/coeff[2] + means[2] + coeff[1]*sds[2]*means[1]/(coeff[2]*sds[1])

Side Note: If you’re wondering what’s going on in these expressions, remember that we normalized our data, so we created new variables \[ x_{new} = \frac{x - \bar{x}}{s_x} \qquad \mbox{and} \qquad y_{new} = \frac{y - \bar{y}}{s_y} \] Subsequently, we found a separating plane in the new variables whose coefficients are stored in the fitted model: \[ a x_{new} + b y_{new} + B = 0 \] Run tidy(svm_fit) to see these (\(B\) is the Bias)! If you plug in the expressions for \(x_{new}\) and \(y_{new}\) to the plane, and solve for \(y\) in terms of \(x\), you get a linear expression with the slope and intercept given above.