Problem set 2: MacArthur Award Data

Due by 6:00 PM on Friday, October 4, 2019

In this Problem Set, you will use R, RStudio, and ggplot to look at a freshly-collected data and not yet cleaned dataset about MacArthur Award winners.

NB: this page was updated around 6pm on Sunday, September 28th 2019 with a better dataset than we had originally.

Create a new RStudio project and place it on your computer somewhere.

Open that new folder in Windows File Explorer or macOS Finder (however you navigate around the files on your computer), and create a folder there named figures

Create a new R Markdown file and save it in your project.

In RStudio go to File > New File > R Markdown…, choose the default options, and delete all the placeholder text in the new file except for the metadata at the top, which is between --- and ---.

Verify that your project folder is structured like this:


Load the libraries required for the analysis

## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()  masks stats::filter()
## ✖ purrr::is_null() masks testthat::is_null()
## ✖ dplyr::lag()     masks stats::lag()
## ✖ dplyr::matches() masks tidyr::matches(), testthat::matches()
## Attaching package: 'socviz'
## The following object is masked from 'package:kjhutils':
##     %nin%

Download the MacArthur Data

The data are at the following URL:

Use the read_csv() function to get this data into an object called macarthur. You can do this more than one way. Either grab the data directly from the url, or save it to your project’s data/ folder first and then load it.

If you download the data directly, save the data file to your computer, in the data/ folder

Use write_csv() to do this.

Examine the MacArthur Data


The variables in the data are as follows:

Questions to answer

  1. How much missing data is there in this file, and what is missing? (Hint: missing data is designated by NA. The function returns TRUE if an observation is missing and FALSE if it is not.)
  2. Let’s say we are interested in whether the age profile of MacArthur award recipients has changed over time. Draw some plots to help you investigate this question.
  3. Can we use geom_boxplot() to look at the age distribution of awardees within each year? (Hint: try using the socviz library’s int_to_year() function to change the year variable from an integer to a date. Alternatively, try using `factor() to make year a categorical variable.)
  4. Does the missing data show any structure that you can see, or is it missing at random?
  5. Can you find any errors in the data? What sort of strategies might we employ for finding errors?
  6. Can you make a plot that picks out or highlights our very own Professor Tung in this data?


Knit the completed R Markdown file as a Word or PDF document (use the “Knit” button at the top of the script editor window). Save it with a name of the form lastname_firstname_ps02 and upload it to the Sakai dropbox.