Problem set 4: California Vaccination Exemptions
Due by 6:00 PM on Friday, November 1, 2019
In this Problem Set, we’ll be looking at some data on California vaccination exemptions
Until recently, in the state of California it was possible to obtain a “Personal Belief Exemption” to avoid the requirement of vaccinating your child before they began school. The dataset you’ll examine in this dataset represents records of exemption rates amongst kindergarten classes in California schools in 2015.
The data is available as an R package. To install it, do the following.
If you haven’t already, install and load
- Install the
- Load it with
- Add the repository where the data is:
You can now install
Create a project for the assignment, as before
Open the project in RStudio and make an Rmd file for the analysis called something like
Load the required libraries
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2 ## ✔ tibble 2.1.3 ✔ dplyr 0.8.3 ## ✔ tidyr 1.0.0 ✔ stringr 1.4.0 ## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ purrr::is_null() masks testthat::is_null() ## ✖ dplyr::lag() masks stats::lag() ## ✖ dplyr::matches() masks tidyr::matches(), testthat::matches()
## ## Attaching package: 'socviz'
## The following object is masked from 'package:kjhutils': ## ## %nin%
Take a look at the data
## # A tibble: 7,032 x 13 ## code county name type district city enrollment pbe_pct exempt ## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> ## 1 1.10e5 ALAME… FAME… PUBL… ALAMEDA… NEWA… 109 13 12.8 ## 2 6.00e6 ALAME… COX … PUBL… ALAMEDA… OAKL… 115 1 0.87 ## 3 6.00e6 ALAME… LAZE… PUBL… ALAMEDA… OAKL… 40 0 0 ## 4 1.24e5 ALAME… YU M… PUBL… ALAMEDA… OAKL… 52 10 9.62 ## 5 6.10e6 ALAME… AMEL… PUBL… ALAMEDA… ALAM… 128 2 1.56 ## 6 6.11e6 ALAME… BAY … PUBL… ALAMEDA… ALAM… 70 1 1.43 ## 7 6.09e6 ALAME… DONA… PUBL… ALAMEDA… ALAM… 100 3 3 ## 8 6.09e6 ALAME… EDIS… PUBL… ALAMEDA… ALAM… 70 1 1.43 ## 9 6.09e6 ALAME… FRAN… PUBL… ALAMEDA… ALAM… 95 1 1.05 ## 10 6.09e6 ALAME… FRAN… PUBL… ALAMEDA… ALAM… 50 2 2 ## # … with 7,022 more rows, and 4 more variables: med_exempt <dbl>, ## # rel_exempt <dbl>, mwc <fct>, kind <fct>
You can get a brief summary of each variable in the dataset by looking at the Help file in RStudio for the
cavax package, or by looking at the documentation on the package homepage: http://kjhealy.github.io/cavax.
Questions to answer
- What is the unit of observation in this dataset?
- What is the average size of kindergarten class enrollment in the state of California? What’s the median class size? What’s the range of variability?
- What percentage of kids have a PBE exemption, on average?
- Explore the structure of variation in PBE exemptions. How does it vary by public and private schools, for instance? Or by county? Or school type? Draw graphs to illustrate the variation you find, and write a sentence or two describing what it looks like to you. Possibly useful geoms you might experiment with include
geom_quasirandom(). The latter two are from the
ggbeeswarmpackage. Read the help for these geoms to see what it is they do.
- Can you find any particularly unusual-looking schools, school types, or counties, either with respect to their PBE rates, their size, or both? Why do you think they might be unusual?
Knit the completed R Markdown file as a Word or PDF document (use the “Knit” button at the top of the script editor window). Save it with a name of the form
lastname_firstname_ps04 and upload it to the Sakai dropbox.