Instructions
For this homework, you should create a new folder in your homework
directory. Call it HW4
or something similar that you can
keep track of. Download the homework markdown template file
Student_HW_template.Rmd
from the course webpage, and put a
copy in this folder. Rename it something like
HW4_YourName.Rmd
. This markdown document will be where you
will answer each of the questions below.
The Assignment
Background
For this assignment, we will use a data set containing information
about the daily percentage returns of the S&P 500 stock market index
for dates in the years 2001-2005. The data are contained in the
ISLR2
package associated with our textbook. Once you’ve
installed and loaded the library, load the data into your environment
using data("Smarket")
. Take a look at the data and make
sure you know what each variable means (using ?Smarket
can
help).
Exploratory Data Analysis
We want to predict the value of the
Direction
variable, A factor with levelsDown
andUp
indicating whether the market had a positive or negative return on a given day. What percentage of observations are in each level?Does there seem to be any relationship between any of the
Lag
variables andDirection
? Does theVolume
seem to have any effect?
Model Definition
- Write a code chunk to specify your model, using \(k=5\) neighbors and the
epanechnikov
weight function. ((this weight function looks like a \(Beta(2,2)\) density on the distances)
Data Splitting and Recipe Creation
Note that this data has a time component. Check out TMWR Section 5.1 for some ideas on how to split data of this type then split your data into training and test sets.
Build a recipe to preprocess the data for K nearest neighbors.
Remove
Today
andYear
or assign a new roleYou need to normalize all the numeric variables so that they are on the same scale.
bake
the recipe just so you can see if the preprocessed data looks like what you expect. Theglimpse
command is nice here since there are so many variables.
Model Fitting
Fit the model and generate the class predictions.
Using the predicted test data, calculate
Specificity
Sensitivity
Accuracy
Explain the meaning of each of the metrics above, and comment on their values for this particular dataset. Which class (
Up
orDown
) gets assigned to “positive” and “negative”?Does it seem that k-nearest neighbors does a satisfactory job? Think about this question in terms of a predictor that picked randomly, and a predictor that always chooses
Up
(see exercise 1).
Submitting HW
When you’ve successfully answered all the questions, knit your document to a HTML file. Look through it to make sure everything worked the way you expect it to.