Step 1: File Structure
There will be a lot of files in this class. Having a good file structure to keep track of everything will be essential. You will be downloading files from the website, modifying them, and also creating and uploading files you have created.
Identify the folder where you keep your coursework. Is it
Documents
,My_Documents
, etc.? In general, you shouldn’t be storing files in the cloud, but locally on your hard drive, as this can cause some problems.Create a new folder in this directory called
Math_465
. It is generally not advisable to have spaces in your folder or file names, so typically we will replace them with underscores.Inside this folder create a series of folders for this class. Have them match the diagram below. Make the
Penguins
folder, but don’t worry about the files inside it quite yet.
Step 2: Installing R
In this step you will be installing what is known as “base” R. It’s R with the basic packages and we will be using many extra packages. R is the engine that runs the code. It is in the lower left-hand pane of RStudio.
In a nutshell…
R is an open-source statistical programming language
R is also an environment for statistical computing and graphics
It’s easily extendable with packages
Step 3: Installing RStudio
Now install RStudio. RStudio is a free and convenient interface for R called an IDE (integrated development environment), e.g. “I write R code in the RStudio IDE”. RStudio is not a requirement for programming with R, but it’s very commonly used by R programmers and data scientists.
Download and install the latest version of RStudio (you want the free desktop version)
Step 4: Downloading Packages
Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. There are over 18,000 R packages available on CRAN (the Comprehensive R Archive Network). We’re going to work with a small (but important) subset of these.
Update the packages that were already installed. In the RStudio menu bar, go to
Tools|Check for Package Updates
.Install the following R packages if you haven’t already (we will install a lot more as needed). Go to
Tools|Install Packages
. If you are ever given a message that you don’t have a package installed, you can always go to this menu and install it. It’s fast and easy.tidyverse
tidymodels
rmarkdown
tex distribution for knitting to pdf: In the console type
tinytex::install_tinytex()
Step 5: Penguins.Rmd
On the main course
webpage you will see a tab called “Class Activities”. There is a
file called Penguins.Rmd
. .Rmd
stands for R
markdown, which is a filetype that enables you to make professional
documents that include R code and outputs (like this webpage!). Download
this file and put it in the Penguins
subfolder that you
created in the Class_Activities
folder.
If you double-click on this file, it should automatically open in
RStudio. In the menu for the file (not the menu for RStudio), there is a
Run
button with a drop-down menu. Choose
Run All
.
If you Run All
the code and see a nice looking plot,
your installation is good to go. Please let me know when you have
completed this step.
At this point you have installed powerful statistical and data science software, and you have created your first visualization. This is the setup professional data scientists all over the world are using.
Step 6: Your Turn!
There is a lot to learn, and it will take some time to get comfortable with R and RStudio. Let’s start with a few things.
RStudio is very customizable. Go to
RStudio|Preferences|Appearance
and look through all the colors you can choose. Play around with it if you’d like!Hit the
Knit
button. A webpage should appear. You just knitted.Rmd
file to html! You now have a .html file in yourPenguins
folder. You can use the Gear icon to have your webpage show in a browser instead of the Viewer tab in the lower right-hand pane.Use the drop-down menu for the
Knit
button to knit to pdf. Make sure this works! You’ll need to knit your homework and other assignments to pdf before uploading to Schoology.Take a look at the code in the First Plot section. That code creates the graph. See if you can do the following (you can run your code using the green “play” arrow).
First, those warnings! Warnings give important information (in this case that there are some missing values in the plot!) but don’t look good in the final product. Can you figure out how to turn these off in the knitted document? Try googling something like “R markdown suppress warnings”.
Make a new section in your document called “Second Plot”.
Change the variable on the x-axis to the penguins’ bill length.
Change the color of the points to the sex variable. What’s wrong?
Change the x-axis label to “Bill length (mm)”
Give the plot a new main title.
Knit the file again and see your changes!
Now let’s practice some basic formatting in R markdown. Using this formatting tips page figure out how to put the following into your lab report. These all can get typed into the white section, where text goes. Hint: To put each of these on its own line hit a hard return (an extra one) between each line of text.
- *Italicize like this*
- **Bold like this**
- A superscript: R^2^
Bonus: If you know latex (a mathematical typesetting language), you can type math formulas directly into the text using `$` or `$$`. E.g., $f(x) = \frac{1}{x}$.
- You can also find R Markdown help by going to
Help -> Cheat Sheets -> R Markdown Cheat Sheet
in the menu bar. Create a third section called “Third Plot”. Using the cheat sheet, see if you can do the following: make the same graph as the previous section, but hide all the code to create it (in other words, only the chart should appear). Change the size of the plot in your output to make it nicer.