Instructions
For this homework, you should create a new folder in your homework
directory. Call it HW2
or something similar that you can
keep track of. Download the homework markdown template file
Student_HW_template.Rmd
from the course webpage, and put a
copy in this folder. Rename it something like
HW2_YourName.Rmd
. This markdown document will be where you
will answer each of the questions below.
You should download Student_HW_Template.Rmd
again; this
time there are some headers for exercises. Answers to “1.” should go
beneath a header called “Exercise 1” and so on…
The Assignment
This homework assignment was created by Dr. Darcie Delzell and slightly modified by me
In this homework you should start off reading a FiveThirtyEight story
titled “The
Dollar-And-Cents Case Against Hollywood’s Exclusion of Women”. It
will give you the background for the dataset. The dataset has been
nicely curated for you in the fivethirtyeight
package and
is called bechdel
. You should install and load this package
before beginning. Don’t forget to look at the help file first!
You’ll focus your analysis on movies released between 1990 and 2013.
- Created a filtered dataset called
bechdel90.13
that only includes movies released between these years. Use thebetween
function in yourfilter
command.
The financial variables you’ll focus on are the following:
budget_2013
: Budget in 2013 inflation adjusted dollarsdomgross_2013
: Domestic gross (US) in 2013 inflation adjusted dollarsintgross_2013
: Total International (i.e., worldwide) gross in 2013 inflation adjusted dollars
And you’ll also use the binary
and
clean_test
variables for different groupings.
Print out a summary of the median values of the 3 budget variables listed above, grouped by whether or not a movie passed the Bechdel test (PASS or FAIL in the
binary
variable). Also in clude a column that gives the number of observations in each group. Use thekable
function to print out a nice table, changing the column names to the titles “med_budget”, “med_domgross”, “med_intgross”, and “num_movies”. What do you learn from the data?Next, take a look at how median budget and gross vary by a more detailed indicator of the Bechdel test result (
ok
= passes test,dubious
,men
= women only talk about men,notalk
= women don’t talk to each other,nowomen
= fewer than two women). What does “dubious” mean? (Refer to the article) Print out the same table as #2, but with the different grouping variable, and sort by budget. What do you notice now?In order to evaluate how return on investment varies among movies that pass and fail the Bechdel test, create a new variable called
roi
as the ratio of the domestic gross to budget. Use theround
function to round to 2 decimal places. Sort byroi
.Create a histogram of
roi
to see its distribution across the entire dataset. Make it look nice (colors, titles, etc.). You’ll notice some very extreme values that make looking at your histogram difficult. One compromise is to limit the x-axis to say, 30, and then print a table of all movies withroi > 30
(use thekable
function again). You probably don’t need to display all the columns, so use theselect
function to only show relevant columns.Comment on your observations of both the table and the histogram. Come up with at least two other questions from this plot/data and create a graph or table to explore your questions.
Submitting HW
When you’ve successfully answered all the questions, knit your
document to a PDF file. Look through it to make sure everything worked
the way you expect it to. You will submit both your .Rmd
and .pdf
files to Schoology.