The raw data behind the story "'Straight Outta Compton' Is The Rare Biopic Not About White Dudes" https://fivethirtyeight.com/features/straight-outta-compton-is-the-rare-biopic-not-about-white-dudes/. An analysis using this data was contributed by Pradeep Adhokshaja as a package vignette at http://fivethirtyeight-r.netlify.com/articles/biopics.html.

biopics

Format

A data frame with 761 rows representing movies and 14 variables:

title

Title of the film.

site

Text to construct IMDB url. Ex: http://www.imdb.com/title/tt1711425

country

Country of origin.

year_release

Year of release.

box_office

Gross earnings at U.S. box office.

director

Director of film.

number_of_subjects

The number of subjects featured in the film.

subject

The actual name of the featured subject.

type_of_subject

The occupation of subject or reason for recognition.

race_known

Indicates whether the subject's race was discernible based on background of self, parent, or grandparent.

subject_race

Race of the subject.

person_of_color

Dummy variable that indicates person of color.

subject_sex

Sex of subject.

lead_actor_actress

The actor or actress who played the subject.

Source

IMDB http://www.imdb.com/