Homework 04

Due Friday, Feb 10

Instructions

For this homework, you should create a new folder in your homework directory. Call it HW4 or something similar that you can keep track of. Download the homework markdown template file Student_HW_template.Rmd from the course webpage, and put a copy in this folder. Rename it something like HW4_YourName.Rmd. This markdown document will be where you will answer each of the questions below.

The Assignment

Background

For this assignment, we will use a data set containing information about the daily percentage returns of the S&P 500 stock market index for dates in the years 2001-2005. The data are contained in the ISLR2 package associate with our textbook. Once you’ve loaded the library, load the data into your environment using data("Smarket"). Take a look at the data and make sure you know what each variable means (using ?Smarket can help).

Exploratory Data Analysis

  1. We want to predict the value of the Direction variable, A factor with levels Down and Up indicating whether the market had a positive or negative return on a given day. What percentage of observations are in each level?

  2. Does there seem to be any relationship between any of the Lag variables and Direction? Does the Volume seem to have any effect?

Model Definition

  1. Write a code chunk to specify your model, using \(k=5\) neighbors and the epanechnikov weight function. ((this weight function looks like a \(Beta(2,2)\) density on the distances)

Data Splitting and Recipe Creation

  1. Split the data into training and test sets. Since this data has a time component, you should reserve the older data as training data and the newer data as testing data. (See the last paragraph of TMWR, Section 5.1)

  2. Build a recipe to preprocess the data for K nearest neighbors.

    1. Remove Today and Year or assign a new role

    2. You need to normalize all the numeric variables so that they are on the same scale.

    3. bake the recipe just so you can see if the preprocessed data looks like what you expect. The glimpse command is nice here since there are so many variables.

Model Fitting

  1. Fit the model and generate the class predictions.

  2. Using the predicted test data, calculate

    1. Specificity

    2. Sensitivity

    3. Accuracy

  3. Explain the meaning of each of the metrics above, and comment on their values for this particular dataset. Which class (Up or Down) gets assigned to “positive” and “negative”?

  4. Does it seem that k-nearest neighbors does a satisfactory job? Think about this question in terms of a predictor that picked randomly, and a predictor that always chooses Up (see exercise 1).

Submitting HW

When you’ve successfully answered all the questions, knit your document to a PDF file. Look through it to make sure everything worked the way you expect it to. You will submit both your .Rmd and .pdf files to Schoology.