We have been learning about how to make inferences about a population mean using hypothesis testing and confidence intervals. In your last lab, you considered data from 2013 flights that departed from New York City. We will consider the same dataset for this lab.

Remember to start a new Markdown document and name it appropriately. You will use a number of functions from your previous labs, so you may want to have them open. You will not have as much direction as you did in the last lab, so you should find a group of 2 or 3 to work with. Only 1 person in the group needs to submit a lab report. You will want to search the Help pages often (lower right window in RStudio); Google will often yield a large number of helpful and fast resources, once you learn how to search well.

Exploratory Data Analysis (EDA)

You will focus today on the variables arr_delay, origin, and air_time.

  1. Create a new dataset called nyc.sub that contains only the columns listed above. Use the select function with piping. (If you don’t remember the select function, type ?select into the console to open the help page)

  2. Use a new function called ggpairs to plot this new dataset. You’ll need to install/load the appropriate library (Google to find out which package you need). Type up a summary of all the information you are able to get from a ggpairs plot. (Note: the message about bins always shows up— Suppress the message in your knitted document. You may need to search for how to suppress messages in an R Markdown code chunk.)

  3. Using the plot from #2, describe the distributions of arr_delay and air_time.

  4. Use the summary function to get some basic statistics for your variables. Which key statistic is missing?

  5. Another descriptive function is the skim function in the skimr package. Try this and comment on any differences with summary.

  6. The ggpairs function can be nice when there are a small number of variables for which you want to quickly see bivariate relationships. But when you want to more deeply investigate an individual variable, you should create separate plots. Plot histograms of air_time separately for the different origins by faceting. (Note: Remove all the messages AND warnings for this chunk)

Inference

Now that we have explored and summarized our data, we will turn to performing inference. Again, there will be a number of functions that you haven’t yet seen, and you’ll have to figure out how to use them. This will help prepare you for the class project (coming after spring break!). Feel free to ask a lot of questions!

  1. Run a \(t\)-test to determine if there is evidence that the mean arrival delay is greater than zero. You will use the function t.test and provide a 99% (two-sided) confidence interval for the mean. Give a conclusion for the above analysis in complete sentences. Comment on what you think about the assumptions of the \(t\)-test. (Note: You will have to run the function twice to get a one-sided \(p\)-value and a two-sided confidence interval. I haven’t given you example code for t.test. See the examples at the bottom of the help file, or Google something like “how to use t.test in R”)

  2. Do the same type of analysis for air time. Conduct a two-sided hypothesis test to determine if there is evidence that air time is different from 150 minutes. Provide a 95% confidence interval for the mean. Note the defaults of the function! Also comment on the assumptions of the test for this variable.

  3. Now run the same test with a different null hypothesis to determine if the true mean is different from 149. Why is the \(p\)-value so different? In other words, why do we fail to show a difference from 150 (not even close), but we have fairly strong evidence that it’s different from 149?

  4. Is there evidence of a difference in mean air time between LaGuardia and JFK airports, using this 2013 data? Include all necessary code and write your conclusions in complete sentences with a reported confidence interval. Also include an appropriate visualization to accompany your inference.

When you are finished with the lab, one person from your group will need to upload your .html file to Canvas. Please make sure everyone’s name is on the lab to get credit! Look over your report to make sure it is rendering properly. Also remember that if you needed output (graphs, numeric output, etc.) to answer a question, the code to generate that output needs to be in the lab report..