class: center middle main-title section-title-1 # .kjh-yellow[Extend your]<br /> .kjh-lblue[`ggplot`] <br /> .kjh-yellow[vocabulary] .class-info[ **Week 06** .light[Kieran Healy<br> Duke University, Spring 2023] ] --- layout: true class: title title-1 --- # Load our libraries .SMALL[ ```r library(here) # manage file paths library(socviz) # data and some useful functions library(tidyverse) # your friend and mine ``` ] --- # Tidyverse components, again .pull-left.w45[ - .kjh-green[**`library`**]`(tidyverse)` - `Loading tidyverse: ggplot2` - `Loading tidyverse: tibble` - `Loading tidyverse: tidyr` - `Loading tidyverse: readr` - `Loading tidyverse: purrr` - `Loading tidyverse: dplyr` ] -- .pull-right.w55[ - Call the package and ... - `<|` **Draw graphs** - `<|` **Nicer data tables** - `<|` **Tidy your data** - `<|` **Get data into R** - `<|` **Fancy Iteration** - `<|` **Action verbs for tables** ] --- # Other tidyverse components .top[.pull-left.w15[ - `forcats` - `haven` - `lubridate` - `readxl` - `stringr` - `reprex` ]] -- .top[.pull-right.w85[ - `<|` **Deal with factors** - `<|` **Import Stata, SPSS, etc** - `<|` **Dates, Durations, Times** - `<|` **Import from spreadsheets** - `<|` **Strings and Regular Expressions** - `<|` **Make reproducible examples** ]] -- .left.bottom[.footnote[Not all of these are attached when we do `library(tidyverse)`]] --- layout: false class: main-title main-title-inv center middle .center[] --- class: main-title main-title-inv center middle .center[] --- class: main-title main-title-inv center middle .center[] --- class: main-title main-title-inv center middle .center[] --- layout: true class: title title-1 --- class: center middle main-title section-title-1 # .huge[.kjh-yellow[Feeding data]<br /> .kjh-lblue[to `ggplot`]] --- layout: false class: center middle ## .middle.huge.squish4[.kjh-orange[Transform and summarize first.]<br />.kjh-lblue[Then send your clean tables to ggplot.]] --- layout: true class: title title-1 --- class: center middle main-title section-title-1 # .huge[.kjh-lblue[Extend your] .kjh-yellow[`ggplot` vocabulary]] --- # We'll move forward in three ways ## .kjh-lblue[Learn more geoms] - .kjh-green[`geom_point()`], .kjh-green[`geom_line()`], .kjh-green[`geom_col()`], .kjh-green[`geom_histogram()`], .kjh-green[`geom_density()`], .kjh-green[`geom_jitter()`], .kjh-green[`geom_boxplot()`], .kjh-green[`geom_pointrange()`],... -- ## .kjh-lblue[Learn more about scales, guides, and themes] - Functions that control the details of representing data and styling our plots. -- ## .kjh-lblue[Learn more about extensions to ggplot] - Packages that enhance .kjh-lblue[`ggplot`]'s capabilities, usually by adding support for new kinds of plot (i.e., new geoms), or new functionality (e.g., the .kjh-lblue[`scales`] package). --- class: right bottom main-title section-title-1 ## .huge.right.bottom.squish4[.kjh-yellow[Some data on]<br />.kjh-lblue[Organ Donation]] --- # .kjh-pink[`organdata`] is in the .kjh-lblue[`socviz`] package ```r organdata ``` ``` ## # A tibble: 238 × 21 ## country year donors pop pop_d…¹ gdp gdp_lag health healt…² pubhe…³ ## <chr> <date> <dbl> <int> <dbl> <int> <int> <dbl> <dbl> <dbl> ## 1 Austral… NA NA 17065 0.220 16774 16591 1300 1224 4.8 ## 2 Austral… 1991-01-01 12.1 17284 0.223 17171 16774 1379 1300 5.4 ## 3 Austral… 1992-01-01 12.4 17495 0.226 17914 17171 1455 1379 5.4 ## 4 Austral… 1993-01-01 12.5 17667 0.228 18883 17914 1540 1455 5.4 ## 5 Austral… 1994-01-01 10.2 17855 0.231 19849 18883 1626 1540 5.4 ## 6 Austral… 1995-01-01 10.2 18072 0.233 21079 19849 1737 1626 5.5 ## 7 Austral… 1996-01-01 10.6 18311 0.237 21923 21079 1846 1737 5.6 ## 8 Austral… 1997-01-01 10.3 18518 0.239 22961 21923 1948 1846 5.7 ## 9 Austral… 1998-01-01 10.5 18711 0.242 24148 22961 2077 1948 5.9 ## 10 Austral… 1999-01-01 8.67 18926 0.244 25445 24148 2231 2077 6.1 ## # … with 228 more rows, 11 more variables: roads <dbl>, cerebvas <int>, ## # assault <int>, external <int>, txp_pop <dbl>, world <chr>, opt <chr>, ## # consent_law <chr>, consent_practice <chr>, consistent <chr>, ccode <chr>, ## # and abbreviated variable names ¹pop_dens, ²health_lag, ³pubhealth ``` --- # First looks ```r p <- ggplot(data = organdata, mapping = aes(x = year, y = donors)) p + geom_point() ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-38-1.png" width="720" style="display: block; margin: auto;" /> --- # First looks ```r p <- ggplot(data = organdata, mapping = aes(x = year, y = donors)) p + geom_line() ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-39-1.png" width="720" style="display: block; margin: auto;" /> --- # First looks ```r p <- ggplot(data = organdata, mapping = aes(x = year, y = donors)) p + geom_line(aes(group = country)) ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-40-1.png" width="720" style="display: block; margin: auto;" /> --- # First looks ```r p <- ggplot(data = organdata, mapping = aes(x = year, y = donors)) p + geom_line() + facet_wrap(~ country, nrow = 3) ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-41a-1.png" width="1512" style="display: block; margin: auto;" /> --- # First looks ```r p <- ggplot(data = organdata, mapping = aes(x = year, y = donors)) p + geom_line() + facet_wrap(~ reorder(country, donors, na.rm = TRUE), nrow = 3) ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-41b-1.png" width="1512" style="display: block; margin: auto;" /> --- # First looks ```r p <- ggplot(data = organdata, mapping = aes(x = year, y = donors)) p + geom_line() + facet_wrap(~ reorder(country, -donors, na.rm = TRUE), nrow = 3) ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-41c-1.png" width="1512" style="display: block; margin: auto;" /> --- class: right bottom main-title section-title-1 ## .huge.right.bottom.squish4[.kjh-yellow[Showing continuous measures] .kjh-lblue[by category]] --- # Boxplots: .kjh-green[`geom_boxplot()`] ```r ## Pipeline the data directly; then it's implicitly the first argument to `ggplot()` organdata |> ggplot(mapping = aes(x = country, y = donors)) + geom_boxplot() ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-42-1.png" width="1080" style="display: block; margin: auto;" /> --- # Put categories on the y-axis! ```r organdata |> * ggplot(mapping = aes(x = donors, y = country)) + geom_boxplot() + labs(y = NULL) ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-43-1.png" width="720" style="display: block; margin: auto;" /> --- # Reorder y by the mean of x ```r organdata |> * ggplot(mapping = aes(x = donors, y = reorder(country, donors, na.rm = TRUE))) + geom_boxplot() + labs(y = NULL) ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-44-1.png" width="720" style="display: block; margin: auto;" /> --- # (Reorder y by any statistic you like) ```r organdata |> * ggplot(mapping = aes(x = donors, y = reorder(country, donors, sd, na.rm = TRUE))) + geom_boxplot() + labs(y = NULL) ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-45-1.png" width="720" style="display: block; margin: auto;" /> --- # .kjh-green[geom_boxplot()] knows .kjh-orange[`color`] and .kjh-orange[`fill`] ```r organdata |> * ggplot(mapping = aes(x = donors, y = reorder(country, donors, na.rm = TRUE), fill = world)) + geom_boxplot() + labs(y = NULL) ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-46-1.png" width="720" style="display: block; margin: auto;" /> --- # These strategies are quite general ```r organdata |> ggplot(mapping = aes(x = donors, y = reorder(country, donors, na.rm = TRUE), color = world)) + * geom_point(size = rel(3)) + labs(y = NULL) ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-47-1.png" width="720" style="display: block; margin: auto;" /> --- # .kjh-green[geom-jitter()] can help with overplotting ```r organdata |> ggplot(mapping = aes(x = donors, y = reorder(country, donors, na.rm = TRUE), color = world)) + * geom_jitter(size = rel(3)) + labs(y = NULL) ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-48-1.png" width="720" style="display: block; margin: auto;" /> --- # Adjust with a .kjh-orange[`position`] argument ```r organdata |> ggplot(mapping = aes(x = donors, y = reorder(country, donors, na.rm = TRUE), color = world)) + * geom_jitter(size = rel(3), position = position_jitter(height = 0.1)) + labs(y = NULL) ``` <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-49-1.png" width="720" style="display: block; margin: auto;" /> --- # Using .kjh-green[`across()`] and .kjh-green[`where()`] ```r by_country <- organdata |> group_by(consent_law, country) |> summarize(across(where(is.numeric), list(mean = ~ mean(.x, na.rm = TRUE), sd = ~ sd(.x, na.rm = TRUE))), * .groups = "drop") head(by_country) ``` ``` ## # A tibble: 6 × 28 ## consen…¹ country donor…² donor…³ pop_m…⁴ pop_sd pop_d…⁵ pop_d…⁶ gdp_m…⁷ gdp_sd ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Informed Austra… 10.6 1.14 18318. 831. 0.237 0.0107 22179. 3959. ## 2 Informed Canada 14.0 0.751 29608. 1193. 0.297 0.0120 23711. 3966. ## 3 Informed Denmark 13.1 1.47 5257. 80.6 12.2 0.187 23722. 3896. ## 4 Informed Germany 13.0 0.611 80255. 5158. 22.5 1.44 22163. 2501. ## 5 Informed Ireland 19.8 2.48 3674. 132. 5.23 0.187 20824. 6670. ## 6 Informed Nether… 13.7 1.55 15548. 373. 37.4 0.898 23013. 3770. ## # … with 18 more variables: gdp_lag_mean <dbl>, gdp_lag_sd <dbl>, ## # health_mean <dbl>, health_sd <dbl>, health_lag_mean <dbl>, ## # health_lag_sd <dbl>, pubhealth_mean <dbl>, pubhealth_sd <dbl>, ## # roads_mean <dbl>, roads_sd <dbl>, cerebvas_mean <dbl>, cerebvas_sd <dbl>, ## # assault_mean <dbl>, assault_sd <dbl>, external_mean <dbl>, ## # external_sd <dbl>, txp_pop_mean <dbl>, txp_pop_sd <dbl>, and abbreviated ## # variable names ¹consent_law, ²donors_mean, ³donors_sd, ⁴pop_mean, … ``` --- # Plot our summary data .pull-left.w45[ ```r by_country |> ggplot(mapping = aes(x = donors_mean, y = reorder(country, donors_mean), color = consent_law)) + geom_point(size=3) + labs(x = "Donor Procurement Rate", y = NULL, color = "Consent Law") ``` ] -- .pull-right.w55[ <img src="06-slides_files/figure-html/codefig-consent1-1.png" width="768" style="display: block; margin: auto;" /> ] --- # What about faceting it instead? .pull-left.w45[ ```r by_country |> ggplot(mapping = aes(x = donors_mean, y = reorder(country, donors_mean), color = consent_law)) + geom_point(size=3) + guides(color = "none") + * facet_wrap(~ consent_law) + labs(x = "Donor Procurement Rate", y = NULL, color = "Consent Law") ``` .pull-left.w80[The problem is that countries can only be in one Consent Law category.] ] -- .pull-right.w55[ <img src="06-slides_files/figure-html/codefig-consent2-1.png" width="768" style="display: block; margin: auto;" /> ] --- # What about faceting it instead? .pull-left.w45[ ```r by_country |> ggplot(mapping = aes(x = donors_mean, y = reorder(country, donors_mean), color = consent_law)) + geom_point(size=3) + guides(color = "none") + * facet_wrap(~ consent_law, ncol = 1) + labs(x = "Donor Procurement Rate", y = NULL, color = "Consent Law") ``` .pull-left.w80[Restricting to one column doesn't fix it.] ] -- .pull-right.w55[ <img src="06-slides_files/figure-html/codefig-consent2a-1.png" width="480" style="display: block; margin: auto;" /> ] --- # Allow the y-scale to vary .pull-left.w45[ ```r by_country |> ggplot(mapping = aes(x = donors_mean, y = reorder(country, donors_mean), color = consent_law)) + geom_point(size=3) + guides(color = "none") + facet_wrap(~ consent_law, ncol = 1, * scales = "free_y") + labs(x = "Donor Procurement Rate", y = NULL, color = "Consent Law") ``` .pull.left.w90[Normally the point of a facet is to preserve comparability between panels by not allowing the scales to vary. But for categorical measures it can be useful to allow this.] ] -- .pull-right.w55[ <img src="06-slides_files/figure-html/codefig-consent3-1.png" width="768" style="display: block; margin: auto;" /> ] --- # Again, these methods are general .pull-left.w50[ ```r by_country |> ggplot(mapping = aes(x = donors_mean, y = reorder(country, donors_mean), color = consent_law)) + * geom_pointrange(mapping = * aes(xmin = donors_mean - donors_sd, * xmax = donors_mean + donors_sd)) + guides(color = "none") + facet_wrap(~ consent_law, ncol = 1, scales = "free_y") + labs(x = "Donor Procurement Rate", y = NULL, color = "Consent Law") ``` ] -- .pull-right.w50[ <img src="06-slides_files/figure-html/codefig-consent4-1.png" width="768" style="display: block; margin: auto;" /> ] --- class: right bottom main-title section-title-1 ## .huge.right.bottom.squish4.kjh-yellow[Plot text directly] --- # .kjh-green[`geom_text()`] for basic labels .pull-left.w45[ ```r by_country |> ggplot(mapping = aes(x = roads_mean, y = donors_mean)) + geom_text(mapping = aes(label = country)) ``` ] -- .pull-right.w55[ <img src="06-slides_files/figure-html/codefig-geomtext-1.png" width="460" style="display: block; margin: auto;" /> ] --- # It's not very flexible .pull-left.w45[ ```r by_country |> ggplot(mapping = aes(x = roads_mean, y = donors_mean)) + geom_point() + geom_text(mapping = aes(label = country), hjust = 0) ``` ] -- .pull-right.w55[ <img src="06-slides_files/figure-html/codefig-geomtext2-1.png" width="460" style="display: block; margin: auto;" /> ] --- # There are tricks, but they're limited .pull-left.w45[ ```r by_country |> ggplot(mapping = aes(x = roads_mean, y = donors_mean)) + geom_point() + geom_text(mapping = aes(x = roads_mean + 2, label = country), hjust = 0) ``` ] -- .pull-right.w55[ <img src="06-slides_files/figure-html/codefig-geomtext3-1.png" width="460" style="display: block; margin: auto;" /> ] --- # We'll use .kjh-lblue[`ggrepel`] instead ### The .kjh-lblue[`ggrepel`] package provides .kjh-green[`geom_text_repel()`] and .kjh-green[`geom_label_repel()`] --- class: right bottom main-title section-title-1 ## .huge.right.bottom.squish4.kjh-yellow[U.S. Historic<br/>Presidential Elections] --- # .kjh-pink[`elections_historic`] is in .kjh-orange[`socviz`] ```r elections_historic ``` ``` ## # A tibble: 49 × 19 ## election year winner win_p…¹ ec_pct popul…² popul…³ votes margin runne…⁴ ## <int> <int> <chr> <chr> <dbl> <dbl> <dbl> <int> <int> <chr> ## 1 10 1824 John Qui… D.-R. 0.322 0.309 -0.104 1.13e5 -38221 Andrew… ## 2 11 1828 Andrew J… Dem. 0.682 0.559 0.122 6.43e5 140839 John Q… ## 3 12 1832 Andrew J… Dem. 0.766 0.547 0.178 7.03e5 228628 Henry … ## 4 13 1836 Martin V… Dem. 0.578 0.508 0.142 7.63e5 213384 Willia… ## 5 14 1840 William … Whig 0.796 0.529 0.0605 1.28e6 145938 Martin… ## 6 15 1844 James Po… Dem. 0.618 0.495 0.0145 1.34e6 39413 Henry … ## 7 16 1848 Zachary … Whig 0.562 0.473 0.0479 1.36e6 137882 Lewis … ## 8 17 1852 Franklin… Dem. 0.858 0.508 0.0695 1.61e6 219525 Winfie… ## 9 18 1856 James Bu… Dem. 0.588 0.453 0.122 1.84e6 494472 John F… ## 10 19 1860 Abraham … Rep. 0.594 0.396 0.101 1.86e6 474049 John B… ## # … with 39 more rows, 9 more variables: ru_part <chr>, turnout_pct <dbl>, ## # winner_lname <chr>, winner_label <chr>, ru_lname <chr>, ru_label <chr>, ## # two_term <lgl>, ec_votes <dbl>, ec_denom <dbl>, and abbreviated variable ## # names ¹win_party, ²popular_pct, ³popular_margin, ⁴runner_up ``` --- # We'll draw a plot like this .center[] --- # Keep things neat ```r ## The packages we'll use in addition to ggplot *library(ggrepel) *library(scales) p_title <- "Presidential Elections: Popular & Electoral College Margins" p_subtitle <- "1824-2016" p_caption <- "Data for 2016 are provisional." x_label <- "Winner's share of Popular Vote" y_label <- "Winner's share of Electoral College Votes" ``` --- # Base Layer, Lines, Points .pull-left.w45[ ```r p <- ggplot(data = elections_historic, mapping = aes(x = popular_pct, y = ec_pct, label = winner_label)) p + geom_hline(yintercept = 0.5, linewidth = 1.4, color = "gray80") + geom_vline(xintercept = 0.5, linewidth = 1.4, color = "gray80") + geom_point() ``` ] -- .pull-right.w55[ <img src="06-slides_files/figure-html/codefig-presplot1-1.png" width="480" style="display: block; margin: auto;" /> ] --- # Add the labels .pull-left.w45[ ```r p <- ggplot(data = elections_historic, mapping = aes(x = popular_pct, y = ec_pct, label = winner_label)) p + geom_hline(yintercept = 0.5, linewidth = 1.4, color = "gray80") + geom_vline(xintercept = 0.5, linewidth = 1.4, color = "gray80") + geom_point() + geom_text_repel() ``` .pull-left.w85[This looks messy because .kjh-green[`geom_text_repel()`] uses the dimensions of the available graphics device to iteratively figure out the labels. Let's allow it to draw on the whole slide.] ] -- .pull-right.w55[ <img src="06-slides_files/figure-html/codefig-presplot2-1.png" width="480" style="display: block; margin: auto;" /> ] --- # The labeling is with respect to the plot size ```r p <- ggplot(data = elections_historic, mapping = aes(x = popular_pct, y = ec_pct, label = winner_label)) p_out <- p + geom_hline(yintercept = 0.5, linewidth = 1.4, color = "gray80") + geom_vline(xintercept = 0.5, linewidth = 1.4, color = "gray80") + geom_point() + * geom_text_repel() ``` --- layout:false class: middle center <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-66-1.png" width="1080" style="display: block; margin: auto;" /> --- layout: true class: title title-1 --- # Adjust the Scales ```r p <- ggplot(data = elections_historic, mapping = aes(x = popular_pct, y = ec_pct, label = winner_label)) p_out <- p + geom_hline(yintercept = 0.5, linewidth = 1.4, color = "gray80") + geom_vline(xintercept = 0.5, linewidth = 1.4, color = "gray80") + geom_point() + geom_text_repel() + * scale_x_continuous(labels = label_percent()) + * scale_y_continuous(labels = label_percent()) ``` --- layout:false class: middle center <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-68-1.png" width="1080" style="display: block; margin: auto;" /> --- layout: true class: title title-1 --- # Add the labels ```r p <- ggplot(data = elections_historic, mapping = aes(x = popular_pct, y = ec_pct, label = winner_label)) p_out <- p + geom_hline(yintercept = 0.5, linewidth = 1.4, color = "gray80") + geom_vline(xintercept = 0.5, linewidth = 1.4, color = "gray80") + geom_point() + * geom_text_repel(mapping = aes(family = "Tenso Slide")) + scale_x_continuous(labels = label_percent()) + scale_y_continuous(labels = label_percent()) + * labs(x = x_label, y = y_label, title = p_title, subtitle = p_subtitle, caption = p_caption) ``` --- layout:false class: middle center <img src="06-slides_files/figure-html/05-work-with-dplyr-and-geoms-70-1.png" width="1080" style="display: block; margin: auto;" /> --- class: right bottom main-title section-title-1 ## .huge.right.bottom.squish4[.kjh-yellow[Labeling points<br />of interest]] --- layout: true class: title title-1 --- # Option 1: On the fly inside .kjh-lblue[`ggplot`] .pull-left.w50[ ```r by_country |> ggplot(mapping = aes(x = gdp_mean, y = health_mean)) + geom_point() + geom_text_repel(data = subset(by_country, gdp_mean > 25000), mapping = aes(label = country)) ``` ] -- .pull-right.w50[ <img src="06-slides_files/figure-html/codefig-subset1-1.png" width="460" style="display: block; margin: auto;" /> ] --- # Option 1: On the fly inside .kjh-lblue[`ggplot`] .pull-left.w50[ ```r by_country |> ggplot(mapping = aes(x = gdp_mean, y = health_mean)) + geom_point() + geom_text_repel(data = subset(by_country, gdp_mean > 25000 | health_mean < 1500 | country %in% "Belgium"), mapping = aes(label = country)) ``` .pull-left.w90[Stuffing everything into the .kjh-green[`subset()`] call might get messy] ] -- .pull-right.w50[ <img src="06-slides_files/figure-html/codefig-subset2-1.png" width="460" style="display: block; margin: auto;" /> ] --- # Option 2: Use .kjh-lblue[`dplyr`] to subset first ```r df_hl <- by_country |> filter(gdp_mean > 25000 | health_mean < 1500 | country %in% "Belgium") df_hl ``` ``` ## # A tibble: 6 × 28 ## consen…¹ country donor…² donor…³ pop_m…⁴ pop_sd pop_d…⁵ pop_d…⁶ gdp_m…⁷ gdp_sd ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Informed Ireland 19.8 2.48 3674. 1.32e2 5.23 0.187 20824. 6670. ## 2 Informed United… 20.0 1.33 269330. 1.25e4 2.80 0.130 29212. 4571. ## 3 Presumed Belgium 21.9 1.94 10153. 1.09e2 30.7 0.330 22500. 3171. ## 4 Presumed Norway 15.4 1.11 4386. 9.73e1 1.35 0.0300 26448. 6492. ## 5 Presumed Spain 28.1 4.96 39666. 9.51e2 7.84 0.188 16933 2888. ## 6 Presumed Switze… 14.2 1.71 7037. 1.70e2 17.0 0.411 27233 2153. ## # … with 18 more variables: gdp_lag_mean <dbl>, gdp_lag_sd <dbl>, ## # health_mean <dbl>, health_sd <dbl>, health_lag_mean <dbl>, ## # health_lag_sd <dbl>, pubhealth_mean <dbl>, pubhealth_sd <dbl>, ## # roads_mean <dbl>, roads_sd <dbl>, cerebvas_mean <dbl>, cerebvas_sd <dbl>, ## # assault_mean <dbl>, assault_sd <dbl>, external_mean <dbl>, ## # external_sd <dbl>, txp_pop_mean <dbl>, txp_pop_sd <dbl>, and abbreviated ## # variable names ¹consent_law, ²donors_mean, ³donors_sd, ⁴pop_mean, … ``` --- # Option 2: Use .kjh-lblue[`dplyr`] to subset first ```r df_hl <- by_country |> filter(gdp_mean > 25000 | health_mean < 1500 | country %in% "Belgium") df_hl ``` ``` ## # A tibble: 6 × 28 ## consen…¹ country donor…² donor…³ pop_m…⁴ pop_sd pop_d…⁵ pop_d…⁶ gdp_m…⁷ gdp_sd ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Informed Ireland 19.8 2.48 3674. 1.32e2 5.23 0.187 20824. 6670. ## 2 Informed United… 20.0 1.33 269330. 1.25e4 2.80 0.130 29212. 4571. ## 3 Presumed Belgium 21.9 1.94 10153. 1.09e2 30.7 0.330 22500. 3171. ## 4 Presumed Norway 15.4 1.11 4386. 9.73e1 1.35 0.0300 26448. 6492. ## 5 Presumed Spain 28.1 4.96 39666. 9.51e2 7.84 0.188 16933 2888. ## 6 Presumed Switze… 14.2 1.71 7037. 1.70e2 17.0 0.411 27233 2153. ## # … with 18 more variables: gdp_lag_mean <dbl>, gdp_lag_sd <dbl>, ## # health_mean <dbl>, health_sd <dbl>, health_lag_mean <dbl>, ## # health_lag_sd <dbl>, pubhealth_mean <dbl>, pubhealth_sd <dbl>, ## # roads_mean <dbl>, roads_sd <dbl>, cerebvas_mean <dbl>, cerebvas_sd <dbl>, ## # assault_mean <dbl>, assault_sd <dbl>, external_mean <dbl>, ## # external_sd <dbl>, txp_pop_mean <dbl>, txp_pop_sd <dbl>, and abbreviated ## # variable names ¹consent_law, ²donors_mean, ³donors_sd, ⁴pop_mean, … ``` --- # Option 2: Use .kjh-lblue[`dplyr`] to subset first .pull-left.w50[ ```r by_country |> ggplot(mapping = aes(x = gdp_mean, y = health_mean)) + geom_point() + geom_text_repel(data = df_hl, mapping = aes(label = country)) ``` .pull-left.w90[This makes things a little neater. As you can see, a `geom` can be fully "autonomous". Each one can have its own .kjh-orange[`mapping`] call _and_ its own .kjh-orange[`data`] source. This can be very useful when building up plots overlaying several sources or subsets of data. ] ] -- .pull-right.w50[ <img src="06-slides_files/figure-html/codefig-subset3-1.png" width="460" style="display: block; margin: auto;" /> ] --- class: right bottom main-title section-title-1 ## .huge.right.bottom.squish4[.kjh-yellow[Write and draw]<br>.kjh-lblue[inside the plot area]] --- layout: true class: title title-1 --- # .kjh-green[`annotate()`] can imitate geoms .pull-left.w50[ ```r organdata |> ggplot(mapping = aes(x = roads, y = donors)) + geom_point() + annotate(geom = "text", family = "Tenso Slide", x = 157, y = 33, label = "A surprisingly high \n recovery rate.", hjust = 0) ``` ] -- .pull-right.w50[ <img src="06-slides_files/figure-html/codefig-annotate1-1.png" width="460" style="display: block; margin: auto;" /> ] --- # .kjh-green[`annotate()`] can imitate geoms .pull-left.w50[ ```r organdata |> ggplot(mapping = aes(x = roads, y = donors)) + geom_point() + annotate(geom = "rect", xmin = 125, xmax = 155, ymin = 30, ymax = 35, fill = "red", alpha = 0.2) + annotate(geom = "text", x = 157, y = 33, family = "Tenso Slide", label = "A surprisingly high \n recovery rate.", hjust = 0) ``` ] -- .pull-right.w50[ <img src="06-slides_files/figure-html/codefig-annotate2-1.png" width="460" style="display: block; margin: auto;" /> ] --- class: center middle main-title section-title-1 # .huge[.kjh-lblue[Scales, Guides, Themes]] --- layout: true class: title title-1 --- # Every .kjh-lblue[mapped variable] has a .kjh-orange[scale] ### Aesthetic mappings link quantities or categories in your data to things you can see on the graph. Thus, they have a scale associated with that representation. ### Scale functions manage this relationship. Remember: not just `x` and `y` but also `color`, `fill`, `shape`, `size`, and `alpha` are scales. - If it can represent your data, it has a scale, and a _scale function_ to manage it. ### This means you control things like color schemes _for data mappings_ through scale functions - Because those colors are representing features of your data. --- # Naming conventions for scale functions - In general, scale functions are named like this: - .center.large[.kjh-green[`scale\\\_`].kjh-orange[`<MAPPING>`].kjh-green[`\\\_`].kjh-lblue[`<KIND>`].kjh-green[`()`]] - .large[We already know there are a lot of .kjh-orange[**mappings**]]. - .right[_.kjh-orange[`x`], .kjh-orange[`y`], .kjh-orange[`color`], .kjh-orange[`size`], .kjh-orange[`shape`], and so on._] - .large[And there are many .kjh-lblue[**kinds**] of scale as well.] - .right[_.kjh-lblue[discrete], .kjh-lblue[continuous], .kjh-lblue[log10], .kjh-lblue[date], .kjh-lblue[binned], and many others._] - .large[So there's a whole zoo of scale functions.] - .right[_The naming convention helps us keep track._] --- # Naming conventions for scale functions - .large.center[.kjh-green[`scale\\\_`].kjh-orange[`mapping`].kjh-green[`\\\_`].kjh-lblue[`kind`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`x`].kjh-green[`\\\_`].kjh-lblue[`continuous`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`y`].kjh-green[`\\\_`].kjh-lblue[`continous`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`x`].kjh-green[`\\\_`].kjh-lblue[`discrete`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`y`].kjh-green[`\\\_`].kjh-lblue[`discrete`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`x`].kjh-green[`\\\_`].kjh-lblue[`log10`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`x`].kjh-green[`\\\_`].kjh-lblue[`sqrt`].kjh-green[`()`]] --- # Naming conventions for scale functions - .large.center[.kjh-green[`scale\\\_`].kjh-orange[`mapping`].kjh-green[`\\\_`].kjh-lblue[`kind`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`color`].kjh-green[`\\\_`].kjh-lblue[`discrete`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`color`].kjh-green[`\\\_`].kjh-lblue[`gradient`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`color`].kjh-green[`\\\_`].kjh-lblue[`gradient2`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`color`].kjh-green[`\\\_`].kjh-lblue[`brewer`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`fill`].kjh-green[`\\\_`].kjh-lblue[`discrete`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`fill`].kjh-green[`\\\_`].kjh-lblue[`gradient`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`fill`].kjh-green[`\\\_`].kjh-lblue[`gradient2`].kjh-green[`()`]] - .center[.kjh-green[`scale\\\_`].kjh-orange[`fill`].kjh-green[`\\\_`].kjh-lblue[`brewer`].kjh-green[`()`]] --- # Scale functions in practice - Scale functions take arguments appropriate to their mapping and kind .pull-left.w50[ ```r organdata |> ggplot(mapping = aes(x = roads, y = donors, color = world)) + geom_point() + scale_y_continuous(breaks = c(5, 15, 25), labels = c("Five", "Fifteen", "Twenty Five")) ``` ] -- .pull-right.w50[ <img src="06-slides_files/figure-html/codefig-scalefn1-1.png" width="460" style="display: block; margin: auto;" /> ] --- # More usefully ... .pull-left.w50[ ```r organdata |> ggplot(mapping = aes(x = roads, y = donors, color = world)) + geom_point() + scale_color_discrete(labels = c("Corporatist", "Liberal", "Social Democratic", "Unclassified")) + labs(x = "Road Deaths", y = "Donor Procurement", color = "Welfare State") ``` ] -- .pull-right.w50[ <img src="06-slides_files/figure-html/codefig-scalecolordiscrete-1.png" width="460" style="display: block; margin: auto;" /> ] --- # The .kjh-green[`guides()`] function .pull-left.w45[ ```r organdata |> ggplot(mapping = aes(x = roads, y = donors, color = consent_law)) + geom_point() + facet_wrap(~ consent_law, ncol = 1) + guides(color = "none") + labs(x = "Road Deaths", y = "Donor Procurement") ``` .pull-left.w90[- Control overall properties of the guide labels. - Common use: turning it off. - We'll see more advanced uses later.] ] -- .pull-right.w50[ <img src="06-slides_files/figure-html/codefig-guidesfn-1.png" width="460" style="display: block; margin: auto;" /> ] --- # The .kjh-green[`theme()`] function .pull-left.w45[ ```r ## Using the "classic" ggplot theme here organdata |> ggplot(mapping = aes(x = roads, y = donors, color = consent_law)) + geom_point() + labs(title = "By Consent Law", x = "Road Deaths", y = "Donor Procurement", color = "Legal Regime:") + theme(legend.position = "bottom", plot.title = element_text(color = "darkred", face = "bold")) ``` .pull-left.w95[ .kjh-green[`theme()`] styles parts of your plot that are _not_ directly representing your data. Often the first thing people want to adjust; but logically it's the _last_ thing. ] ] -- .pull-right.w50[ <img src="06-slides_files/figure-html/codefig-themefn-1.png" width="480" style="display: block; margin: auto;" /> ] --- # Sidenote: Smoothers .center[] --- # Sidenote: Smoothers .center[] --- # Sidenote: Smoothers .center[] --- # Sidenote: Smoothers .center[] --- # Sidenote: Smoothers .center[]