Regularization and Penalization with NNs

Load the MNIST data, pre-split into training and test sets.

library(keras)

mnist <- dataset_mnist()
X_train <- mnist$train$x
X_test <- mnist$test$x
y_train <- mnist$train$y
y_test <- mnist$test$y

X_train <- array_reshape(X_train, c(nrow(X_train), 784)) /255
X_test <- array_reshape(X_test, c(nrow(X_test), 784)) / 255

# We also assign each digit to a class
y_train <- to_categorical(y_train, num_classes = 10)
y_test <- to_categorical(y_test, num_classes = 10)

Penalization

In keras, we add ridge, LASSO, or elastic net penalization to the parameters of each layer. For instance, if we wanted to add elastic net penalization to the weights of the first layer of a NN, inside of the layer_dense() we would add the argument kernel_regularizer=regularizer_l1_l2(l1=1e-4, l2=1e-4). To penalize the biases, the corresponding argument is bias_regularizer.

The options are regularizer_l1, regularizer_l2 and regularizer_l1_l2.

1. Modify the following code to create a NN that uses elastic net regularization in the two hidden layers to penalize the weights. Do not penalize the output layer.

nn_model <- keras_model_sequential() %>%
  layer_dense(units = 256, activation = "relu", 
              input_shape = c(784)
              ) %>%
  layer_dense(units = 128, activation = "relu") %>%
  layer_dense(units = 10, activation = "softmax") # This is the output layer

2. Fit the model and report the test set accuracy. You may want to refer to the previous class demo to get the code for this. How does your loss/accuracy curve compare to the one below, using the same model architecture and no regularization?

3. Which number or numbers was the model most likely to misclassify?

Dropout Regularization

1. Describe in your own words what dropout regularization does.

2. Following ISL p. 446, create a new NN with two hidden layers, and a dropout layer after each, where the dropout rate is .4 for the first dropout layer, and .3 for the second dropout layer.

2. Fit the model and report the test set accuracy. How does it compare to your first model?

3. Which number or numbers was the model most likely to misclassify? Compare to above.

Turning in

When you’re done, turn your completed activity into schoology!