Problem set 5: Birth and Death Rates

Due by 6:00 PM on Friday, November 15, 2019

In this Problem Set, we will examine birth and death rates for various countries

The data is available as an R package. To install it, do the following.

If you haven’t already, install and load drat:

  1. Install the drat package with install.packages(drat)
  2. Load it with library(drat)
  3. Add the repository where the data is: drat::addRepo("kjhealy")

You can now install demog with

  1. install.packages("demog")

Note that the demog package has been updated since last week! So install.packages("demog") even if you’ve done it before.

Create a project for the assignment

Open the project in RStudio and make an Rmd file for the analysis called something like okboomer.Rmd

Load the required libraries

library(tidyverse)
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()  masks stats::filter()
## ✖ purrr::is_null() masks testthat::is_null()
## ✖ dplyr::lag()     masks stats::lag()
## ✖ dplyr::matches() masks tidyr::matches(), testthat::matches()
library(socviz)
## 
## Attaching package: 'socviz'
## The following object is masked from 'package:kjhutils':
## 
##     %nin%
library(demog)

Take a look at the data

There are various data files included in this package. You can get a brief summary of each variable in the dataset by looking at the Help file in RStudio for the demog package, or by looking at the documentation on the package homepage: http://kjhealy.github.io/demog.

Start with birth rates in the US and England/Wales:

okboomer
## # A tibble: 1,644 x 12
##     year month n_days births total_pop births_pct births_pct_day date      
##    <dbl> <dbl>  <dbl>  <dbl>     <dbl>      <dbl>          <dbl> <date>    
##  1  1938     1     31  51820  41215000    0.00126           40.6 1938-01-01
##  2  1938     2     28  47421  41215000    0.00115           41.1 1938-02-01
##  3  1938     3     31  54887  41215000    0.00133           43.0 1938-03-01
##  4  1938     4     30  54623  41215000    0.00133           44.2 1938-04-01
##  5  1938     5     31  56853  41215000    0.00138           44.5 1938-05-01
##  6  1938     6     30  53145  41215000    0.00129           43.0 1938-06-01
##  7  1938     7     31  53214  41215000    0.00129           41.6 1938-07-01
##  8  1938     8     31  50444  41215000    0.00122           39.5 1938-08-01
##  9  1938     9     30  50545  41215000    0.00123           40.9 1938-09-01
## 10  1938    10     31  50079  41215000    0.00122           39.2 1938-10-01
## # … with 1,634 more rows, and 4 more variables: seasonal <dbl>,
## #   trend <dbl>, remainder <dbl>, country <chr>

Questions to answer on the Birth Rates data

  1. What is the unit of observation in this dataset?
  2. Make a plot of birth rates for the United States.
  3. Over the years, which month has the highest average number of births per capita per day?
  4. Draw a plot showing the seasonality of births in the United States for each decade in the dataset. Hint: Consider using coord_polar() to draw a circular chart.

Mortality rates

The mortality rate data is in a nested tibble:

mortality
## # A tibble: 47 x 3
## # Groups:   country, ccode [47]
##    country      ccode                  data
##    <chr>        <chr>        <list<df[,5]>>
##  1 Australia    australia      [10,434 × 5]
##  2 Austria      austria         [7,881 × 5]
##  3 Belgium      belgium        [19,425 × 5]
##  4 Bulgaria     bulgaria        [7,104 × 5]
##  5 Belarus      belarus         [6,438 × 5]
##  6 Canada       canada         [10,101 × 5]
##  7 Switzerland  switzerland    [15,651 × 5]
##  8 Chile        chile           [1,887 × 5]
##  9 Czechia      czechia         [7,437 × 5]
## 10 East Germany east_germany    [6,660 × 5]
## # … with 37 more rows

The data column of this tibble contains each country’s estimated male, female, and total morality rates for each age-year combination.

Questions to answer on the Mortality data

Choose a country from the data to focus on.

  1. Plot the combined (total) mortality rate at age 0, age 5, age 50, and age 70 across all years in your country’s data.
  2. Create a Lexis Surface plot or heatmap for that country’s male mortality rate. Hint: Use geom_raster() or geom_tile() to graph the mortality rate mapped to the fill aesthetic.
  3. Calculate the ratio of male to female mortality rates for the country you’ve chosen, and the percentage difference between male and female mortality rates. Make a plot of the results. What sort of color scale should you use for measures like these, and why? Briefly discuss any patterns you can discern in the data you graph.

Finish

Knit the completed R Markdown file as a Word or PDF document (use the “Knit” button at the top of the script editor window). Save it with a name of the form lastname_firstname_ps05 and upload it to the Sakai dropbox.