Homework 02

Due Friday, Jan 27th

Instructions

For this homework, you should create a new folder in your homework directory. Call it HW2 or something similar that you can keep track of. Download the homework markdown template file Student_HW_template.Rmd from the course webpage, and put a copy in this folder. Rename it something like HW2_YourName.Rmd. This markdown document will be where you will answer each of the questions below.

You should download Student_HW_Template.Rmd again; this time there are some headers for exercises. Answers to “1.” should go beneath a header called “Exercise 1” and so on…

The Assignment

This homework assignment was created by Dr. Darcie Delzell and slightly modified by me

In this homework you should start off reading a FiveThirtyEight story titled “The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women”. It will give you the background for the dataset. The dataset has been nicely curated for you in the fivethirtyeight package and is called bechdel. You should install and load this package before beginning. Don’t forget to look at the help file first!

You’ll focus your analysis on movies released between 1990 and 2013.

  1. Created a filtered dataset called bechdel90.13 that only includes movies released between these years. Use the between function in your filter command.

The financial variables you’ll focus on are the following:

  • budget_2013: Budget in 2013 inflation adjusted dollars
  • domgross_2013: Domestic gross (US) in 2013 inflation adjusted dollars
  • intgross_2013: Total International (i.e., worldwide) gross in 2013 inflation adjusted dollars

And you’ll also use the binary and clean_test variables for different groupings.

  1. Print out a summary of the median values of the 3 budget variables listed above, grouped by whether or not a movie passed the Bechdel test (PASS or FAIL in the binary variable). Also in clude a column that gives the number of observations in each group. Use the kable function to print out a nice table, changing the column names to the titles “med_budget”, “med_domgross”, “med_intgross”, and “num_movies”. What do you learn from the data?

  2. Next, take a look at how median budget and gross vary by a more detailed indicator of the Bechdel test result (ok = passes test, dubious, men = women only talk about men, notalk = women don’t talk to each other, nowomen = fewer than two women). What does “dubious” mean? (Refer to the article) Print out the same table as #2, but with the different grouping variable, and sort by budget. What do you notice now?

  3. In order to evaluate how return on investment varies among movies that pass and fail the Bechdel test, create a new variable called roi as the ratio of the domestic gross to budget. Use the round function to round to 2 decimal places. Sort by roi.

  4. Create a histogram of roi to see its distribution across the entire dataset. Make it look nice (colors, titles, etc.). You’ll notice some very extreme values that make looking at your histogram difficult. One compromise is to limit the x-axis to say, 30, and then print a table of all movies with roi > 30 (use the kable function again). You probably don’t need to display all the columns, so use the select function to only show relevant columns.

  5. Comment on your observations of both the table and the histogram. Come up with at least two other questions from this plot/data and create a graph or table to explore your questions.

Submitting HW

When you’ve successfully answered all the questions, knit your document to a PDF file. Look through it to make sure everything worked the way you expect it to. You will submit both your .Rmd and .pdf files to Schoology.