Problem set 1: Plotting Gapminder Data

Due by 6:00 PM on Friday, September 20, 2019

In this Problem Set, you will use R, RStudio, and ggplot to look at the gapminder data again.

Our goal is to create an RStudio project, load the required libraries, and begin looking at our data.

Create a new RStudio project and place it on your computer somewhere.

Open that new folder in Windows File Explorer or macOS Finder (however you navigate around the files on your computer), and create a folder there named figures

Create a new R Markdown file and save it in your project.

In RStudio go to File > New File > R Markdown…, choose the default options, and delete all the placeholder text in the new file except for the metadata at the top, which is between --- and ---.

Verify that your project folder is structured like this:

```
your-project-name/
  your-analysis.Rmd
  your-project-name.Rproj
  figures/
    <EMPTY>
```

Load the libraries required for the analysis

Create an R code chunk that runs this code:

library(tidyverse)
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()  masks stats::filter()
## ✖ purrr::is_null() masks testthat::is_null()
## ✖ dplyr::lag()     masks stats::lag()
## ✖ dplyr::matches() masks tidyr::matches(), testthat::matches()
library(gapminder)
library(socviz)
## 
## Attaching package: 'socviz'
## The following object is masked from 'package:kjhutils':
## 
##     %nin%

Examine the Gapminder Data

gapminder
## # A tibble: 1,704 x 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # … with 1,694 more rows

Create the following plots (Make a new code chunk for each plot)

  1. A scatter plot (using geom_point()) of GDP per Capita (on the x-axis) and Life Expectancy (on the y-axis).
  2. Adjust the plot so that the x-axis scale is in log units rather than raw GDP, and properly label the axes.
  3. Using the subset() function, choose a single year and make a histogram of country populations. Hint: Use == and not = when asking subset to pick a particular value of year.
  4. Experiment with the binwidth argument (or alternatively the bins argument) as the note from stat_bin() suggests.
  5. Try adding a log scale to the x-axis
  6. Facet the plot by continent
  7. Subset the data so that you pick out the following four years: 1952, 1972, 1992, and 2002. Plot a histogram of life expectancy faceted by year.

Briefly answer the following questions about this final plot:

Finish

Knit the completed R Markdown file as a Word or PDF document (use the “Knit” button at the top of the script editor window). Save it with a name of the form lastname_firstname_ps01 and upload it to the Sakai dropbox.