Problem set 2: MacArthur Award Data
Due by 6:00 PM on Friday, October 4, 2019
In this Problem Set, you will use R, RStudio, and ggplot to look at a freshly-collected data and not yet cleaned dataset about MacArthur Award winners.
NB: this page was updated around 6pm on Sunday, September 28th 2019 with a better dataset than we had originally.
Create a new RStudio project and place it on your computer somewhere.
Open that new folder in Windows File Explorer or macOS Finder (however you navigate around the files on your computer), and create a folder there named
Create a new R Markdown file and save it in your project.
In RStudio go to File > New File > R Markdown…, choose the default options, and delete all the placeholder text in the new file except for the metadata at the top, which is between
Verify that your project folder is structured like this:
``` your-project-name/ your-analysis.Rmd your-project-name.Rproj data/ <EMPTY> figures/ <EMPTY> ```
Load the libraries required for the analysis
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2 ## ✔ tibble 2.1.3 ✔ dplyr 0.8.3 ## ✔ tidyr 1.0.0 ✔ stringr 1.4.0 ## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ purrr::is_null() masks testthat::is_null() ## ✖ dplyr::lag() masks stats::lag() ## ✖ dplyr::matches() masks tidyr::matches(), testthat::matches()
## ## Attaching package: 'socviz'
## The following object is masked from 'package:kjhutils': ## ## %nin%
Download the MacArthur Data
The data are at the following URL:
read_csv() function to get this data into an object called
macarthur. You can do this more than one way. Either grab the data directly from the url, or save it to your project’s
data/ folder first and then load it.
If you download the data directly, save the data file to your computer, in the
write_csv() to do this.
Examine the MacArthur Data
The variables in the data are as follows:
name: The name of the awardee
year: The year of their award
age: Their age at the time of their award
sex: Our best guess of as to the awardee’s gender, automatically inferred from their fellowship page
Questions to answer
- How much missing data is there in this file, and what is missing? (Hint: missing data is designated by
TRUEif an observation is missing and
FALSEif it is not.)
- Let’s say we are interested in whether the age profile of MacArthur award recipients has changed over time. Draw some plots to help you investigate this question.
- Can we use
geom_boxplot()to look at the age distribution of awardees within each year? (Hint: try using the
int_to_year()function to change the year variable from an integer to a date. Alternatively, try using `
yeara categorical variable.)
- Does the missing data show any structure that you can see, or is it missing at random?
- Can you find any errors in the data? What sort of strategies might we employ for finding errors?
- Can you make a plot that picks out or highlights our very own Professor Tung in this data?
Knit the completed R Markdown file as a Word or PDF document (use the “Knit” button at the top of the script editor window). Save it with a name of the form
lastname_firstname_ps02 and upload it to the Sakai dropbox.