Instructions
For this homework, you should create a new folder in your homework
directory. Call it HW4
or something similar that you can
keep track of. Download the homework markdown template file
Student_HW_template.Rmd
from the course webpage, and put a
copy in this folder. Rename it something like
HW4_YourName.Rmd
. This markdown document will be where you
will answer each of the questions below.
The Assignment
Background
For this assignment, we will use a data set containing information
about the daily percentage returns of the S&P 500 stock market index
for dates in the years 2001-2005. The data are contained in the
ISLR2
package associate with our textbook. Once you’ve
loaded the library, load the data into your environment using
data("Smarket")
. Take a look at the data and make sure you
know what each variable means (using ?Smarket
can
help).
Exploratory Data Analysis
We want to predict the value of the
Direction
variable, A factor with levelsDown
andUp
indicating whether the market had a positive or negative return on a given day. What percentage of observations are in each level?Does there seem to be any relationship between any of the
Lag
variables andDirection
? Does theVolume
seem to have any effect?
Model Definition
- Write a code chunk to specify your model, using \(k=5\) neighbors and the
epanechnikov
weight function. ((this weight function looks like a \(Beta(2,2)\) density on the distances)
Data Splitting and Recipe Creation
Split the data into training and test sets. Since this data has a time component, you should reserve the older data as training data and the newer data as testing data. (See the last paragraph of TMWR, Section 5.1)
Build a recipe to preprocess the data for K nearest neighbors.
Remove
Today
andYear
or assign a new roleYou need to normalize all the numeric variables so that they are on the same scale.
bake
the recipe just so you can see if the preprocessed data looks like what you expect. Theglimpse
command is nice here since there are so many variables.
Model Fitting
Fit the model and generate the class predictions.
Using the predicted test data, calculate
Specificity
Sensitivity
Accuracy
Explain the meaning of each of the metrics above, and comment on their values for this particular dataset. Which class (
Up
orDown
) gets assigned to “positive” and “negative”?Does it seem that k-nearest neighbors does a satisfactory job? Think about this question in terms of a predictor that picked randomly, and a predictor that always chooses
Up
(see exercise 1).
Submitting HW
When you’ve successfully answered all the questions, knit your
document to a PDF file. Look through it to make sure everything worked
the way you expect it to. You will submit both your .Rmd
and .pdf
files to Schoology.