class: center middle main-title section-title-1 # .kjh-yellow[Getting to know]<br /> .kjh-lblue[R and RStudio] .class-info[ **Week 02** .light[Kieran Healy<br> Duke University, Spring 2023] ] --- layout: false class: center middle main-title main-title-inv # .middle.huge.squish4[We want to<br />.kjh-orange[draw graphs]<br />.kjh-green[reproducibly]] --- layout: false .left[] .right[] --- layout: true class: title title-1 --- # Abstraction in software .pull-left[ ## Less - Easy things are awkward - Hard things are straightforward - Really hard things are possible ] .pull-right[ ## More - Easy things are trivial - Hard things are very awkward - Really hard things are impossible ] .center.large[Compare: D3, Grid, ggplot, Stata, Excel] --- class: center middle main-title section-title-1 # .huge[The .kjh-lblue[RStudio] IDE] --- layout: false class: bottom background-image: url("img/02_ide_control_room.png") background-size: cover ## .huge.right.bottom.squish4.kjh-grey[An IDE for R] --- layout: false class: bottom background-image: url("img/02_ide_kitchen.png") background-size: cover ## .huge.right.bottom.squish4.kjh-grey[An IDE for Meals] --- layout: false .center[] .right.w90.small[RStudio at startup] --- layout: false .center[] .right.w90.small[RStudio schematic overview] --- layout: false .center[] .right.w90.small[RStudio schematic overview] --- layout: false class: center middle ## .middle.huge.squish4[Think in terms of<br />.kjh-orange[Data] + .kjh-green[Transformations], written out as code, rather than a series of point-and-click steps] --- layout: false class: center middle ## .middle.huge.squish4[Our starting .kjh-orange[data] + our .kjh-green[code] is what's "real" in our projects, not the final output or any intermediate objects] --- layout: false .center[] .right.w90.small[RStudio at startup] --- layout: false .center[] .right.w90.small[RStudio at startup] --- layout: false .center[] .right.w90.small[RStudio at startup] --- layout: false .center[] .right.w90.small[RStudio at startup] --- layout: false .center[] .right.w90.small[RStudio at startup] --- class: center middle main-title section-title-1 # .large.squish4[Use .kjh-yellow[RMarkdown] to .kjh-orange[produce] and .kjh-green[reproduce] work] --- layout: true class: title title-1 --- # Where we want to end up .center[] --- # Where we want to end up .center[] --- # Where we want to end up .center[] --- # How to get there? .pull-left[] .pull-right[ - We could write an **R script** with some notes inside, using it to create some figures and tables, paste them into our document. - This will work, but we can do better. ] --- # We can .kjh-yellow[make] this ... .pull-left[] .pull-right.large[This is what we want to end up with. Nicely-formatted text, plots, and tables. In .kjh-red[an "Office" approach] we write the document and paste in the figures and tables.] --- # ... by .kjh-green[writing] this .pull-left[] .pull-right.large[In a .kjh-red[literate programming] approach, chunks of code contained in documents are processed and then replaced with their output when the output document is produced.] --- # The .kjh-pink[`code`] gets replaced by its .kjh-green[output] .pull-left[] .pull-right[] --- layout: false class: center  --- .center[] --- .pull-left[] -- .right.large[ - This approach has its limitations, but it's _very_ useful and has many benefits. ] --- layout: true class: title title-1 --- # Basic markdown summary .smaller[ | Desired style | Use the following Markdown annotation | | -------------- | ------------------------------------- | | .large[**Heading 1**] | `# Heading 1` | | .medium[**Heading 2**] | `## Heading 2` | | .small[Heading 3] | `### Heading 3` (Actual heading styles will vary.) | | Paragraph | Just start typing | | **Bold** | `**Bold**` | | *Italic* | `*Italic*` | | Images | `[Alternate text for image](path/image.jpg)` | | [Hyperlinks](https://www.visualizingsociety.com) | `[Link text](https://www.visualizingsociety.com/)` | | Unordered Lists | | | - First | `- First` | | - Second. | `- Second` | | - Third |`- Third` | | Ordered Lists | | | 1. First | `1. First` | | 2. Second. | `2. Second` | | 3. Third |`3. Third` | | Footnote.¹ | `Footnote[^notelabel]` | | ¹The note's content. | `[^notelabel] The note's content.` | ] --- # The right frame of mind - This is like learning how to drive a car, or how to cook in a kitchen ... or learning to speak a language. -- - After some orientation to what's where, you will learn best by _doing_. -- - Software is a pain, but you won't crash the car or burn your house down. ??? - Don't be afraid of the IDE or code. Expect to be frustrated, and don't be surprised when things go wrong. Things will go wrong _constantly_. The software is a very powerful, very obedient, and _very_ dumb robot. - But every time things "don't work", and every time you diagnose and fix them, you will become a little more adept at noticing and fixing these errors. And you will start to accumulate practical knowledge of common failures. - So be like Jacques and keep at it. --- layout: false class: main-title main-title-inv center middle # .huge.squish4[TYPE OUT<br />YOUR CODE<br />.kjh-orange[BY HAND]] --- .center[] --- layout: true class: center middle main-title section-title-1 --- # .huge.middle.squish4[<br />GETTING <br />O.kjh-lblue[R]IENTED] --- layout: true class: title title-1 --- # Loading the tidyverse libraries ```r library(tidyverse) ``` - The tidyverse has several components. - We'll return to this message about Conflicts later. - Again, the code and messages you see here is actual R output, produced at the same time as the slide. --- # Tidyverse components .pull-left[ - .kjh-green[**`library`**]`(tidyverse)` - `Loading tidyverse: ggplot2` - `Loading tidyverse: tibble` - `Loading tidyverse: tidyr` - `Loading tidyverse: readr` - `Loading tidyverse: purrr` - `Loading tidyverse: dplyr` ] -- .pull-right[ - Load the package and ... - `<|` **Draw graphs** - `<|` **Nicer data tables** - `<|` **Tidy your data** - `<|` **Get data into R** - `<|` **Fancy Iteration** - `<|` **Action verbs for tables** ] --- # What R looks like Code you can type and run: ```r ## Inside code chunks, lines beginning with a # character are comments ## Comments are ignored by R my_numbers <- c(1, 1, 2, 4, 1, 3, 1, 5) # Anything after a # character is ignored as well ``` Output: .smaller[Equivalent to running the code above, typing `my_numbers` at the console, and hitting enter.] ```r my_numbers ``` ``` ## [1] 1 1 2 4 1 3 1 5 ``` --- # What R looks like By convention, code output in documents is prefixed by `##` ```r my_numbers ``` ``` ## [1] 1 1 2 4 1 3 1 5 ``` -- Also by convention, outputting vectors, etc, gets a counter keeping track of the number of elements. For example, ```r letters ``` ``` ## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" ## [20] "t" "u" "v" "w" "x" "y" "z" ``` --- layout: false class: center middle # .center.middle.huge.squish4[SOME THINGS<br />TO KNOW<br />ABOUT .kjh-orange[R]] --- layout: true class: title title-1 --- # 0. .kjh-yellow[It's a calculator] .pull-left[ - Arithmetic ```r (31 * 12) / 2^4 ``` ``` ## [1] 23.25 ``` ```r sqrt(25) ``` ``` ## [1] 5 ``` ```r log(100) ``` ``` ## [1] 4.60517 ``` ```r log10(100) ``` ``` ## [1] 2 ``` ] -- .pull-right[ - Logic ```r 4 < 10 ``` ``` ## [1] TRUE ``` ```r 4 > 2 & 1 > 0.5 # The "&" means "and" ``` ``` ## [1] TRUE ``` ```r 4 < 2 | 1 > 0.5 # The "|" means "or" ``` ``` ## [1] TRUE ``` ```r 4 < 2 | 1 < 0.5 ``` ``` ## [1] FALSE ``` ] --- # 0. .kjh-yellow[It's a calculator] Logical equality and inequality (yielding a .kjh-green[`TRUE`] or .kjh-red[`FALSE`] result) is done with `==` and `!=`. Other logical operators include `<`, `>`, `<=`, `>=`, and `!` for negation. We'll use these in plots to filter data, test conditions, and so on. .medium[ ```r ## A logical test 2 == 2 # Write `=` twice ``` ``` ## [1] TRUE ``` ```r ## This will cause an error, because R will think you are trying to assign a value 2 = 2 ## Error in 2 = 2 : invalid (do_set) left-hand side to assignment ``` ```r 3 != 7 # Write `!` and then `=` to make `!=` ``` ``` ## [1] TRUE ``` ] --- layout: true class: title title-1 --- # 1. .kjh-yellow[Everything in R has a name] ```r my_numbers # We created this a few minutes ago ``` ``` ## [1] 1 1 2 4 1 3 1 5 ``` ```r letters # This one is built-in ``` ``` ## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" ## [20] "t" "u" "v" "w" "x" "y" "z" ``` ```r pi # Also built-in ``` ``` ## [1] 3.141593 ``` --- # Some names are forbidden Or it's a _really_ bad idea to try to use them ```r ## Don't name objects with terms fo logical values, ## or missing and null-value indicators TRUE FALSE Inf NaN NA NULL ## Don't name objects with terms that are also ## built-in functions for programming and flow-control for if while break function ``` --- # 2. .kjh-yellow[Everything is an object] There are a few built-in objects: ```r letters ``` ``` ## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" ## [20] "t" "u" "v" "w" "x" "y" "z" ``` -- ```r pi ``` ``` ## [1] 3.141593 ``` -- ```r LETTERS ``` ``` ## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" ## [20] "T" "U" "V" "W" "X" "Y" "Z" ``` --- # 3. .kjh-yellow[You can create objects] -- In fact, this is mostly what we will be doing. -- Objects are created by .kjh-pink[_assigning_] a thing to a name: ```r ## name... gets ... this stuff my_numbers <- c(1, 2, 3, 1, 3, 5, 25, 10) ## name ... gets ... the output of the function `c()` your_numbers <- c(5, 31, 71, 1, 3, 21, 6, 52) ``` -- The .kjh-green[**`c()`**] function _combines_ or _concatenates_ things The **assignment operator**, .kjh-pink[**`<-`**], performs the action of creating objects. ??? The core thing we do in R is _create objects_ by _assigning a thing to a name_. That thing is usually the output of some _function_. There are a lot of built-in functions. We can create an object with the .kjh-green[**`c()`**] function and the *assignment operator*, `<-`. --- # The assignment operator - The .kjh-pink[assignment operator] performs the action of creating objects: -- - Use a keyboard shortcut to write it: - Press .kjh-green[**`option`**] _and_ .kjh-green[**`-`**] on a Mac - Press .kjh-green[**`alt`**] _and_ .kjh-green[**`-`**] on Windows --- # Assignment with .kjh-green[**`=`**] - You can use ".kjh-green[**`=`**]" as well as ".kjh-green[**`<-`**]" for assignment ```r my_numbers = c(1, 2, 3, 1, 3, 5, 25) my_numbers ``` ``` ## [1] 1 2 3 1 3 5 25 ``` -- On the other hand, ".kjh-green[**`=`**]" has a different meaning when used in functions. -- I'm going to use ".kjh-green[**`<-`**]" for assigment throughout. Just be consistent either way. --- # Assignment with .kjh-green[**`=`**] .center[] ??? --- layout: true class: title title-1 --- # 4. Do things to objects with .kjh-green[functions] ```r ## this object... gets ... the output of this function my_numbers <- c(1, 2, 3, 1, 3, 5, 25, 10) your_numbers <- c(5, 31, 71, 1, 3, 21, 6, 52) ``` ```r my_numbers ``` ``` ## [1] 1 2 3 1 3 5 25 10 ``` - Functions can be identified by the parentheses after their names. ```r my_numbers ``` ``` ## [1] 1 2 3 1 3 5 25 10 ``` ```r ## If you run this you'll get an error mean() ``` --- # What .kjh-green[functions] usually do - They take .kjh-orange[**inputs**] to .kjh-pink[**arguments**] - They perform .kjh-green[**actions**] - They produce, or return, .kjh-lblue[**outputs**] -- .pull-left[ ### .kjh-lblue[`x`] .kjh-green[`<-`] .kjh-green[`c(`].kjh-orange[1, 2, 3, 1, 3, 5, 25, 10].kjh-green[`)`] ### .kjh-blue[`x`] ### .kjh-blue[`[1] 1 2 3 1 3 5 25 10`] ] -- .pull-right[ ### .kjh-green[**`mean`(**].kjh-pink[`x`] `=` .kjh-orange[`my_numbers`].kjh-green[**)**] ### .kjh-lblue[`[1] 6.25`] ] --- # What .kjh-green[functions] usually do .large[ ```r ## Get the mean of what? Of x. ## You need to tell the function what x is mean(x = my_numbers) ``` ``` ## [1] 6.25 ``` ```r mean(x = your_numbers) ``` ``` ## [1] 23.75 ``` ] -- If you don't _name_ the arguments, R assumes you are providing them in the order the function expects. ```r mean(your_numbers) ``` ``` ## [1] 23.75 ``` --- # What .kjh-green[functions] usually do What arguments? Which order? Read the function's help page ```r help(mean) ``` ```r ## quicker ?mean ``` -- How to read an R help page ... --- # What .kjh-green[functions] usually do Arguments often tell the function what to do in specific circumstances ```r missing_numbers <- c(1:10, NA, 20, 32, 50, 104, 32, 147, 99, NA, 45) mean(missing_numbers) ``` ``` ## [1] NA ``` ```r mean(missing_numbers, na.rm = TRUE) ``` ``` ## [1] 32.44444 ``` -- Or select from one of several options ```r ## Look at ?mean to see what `trim` does mean(missing_numbers, na.rm = TRUE, trim = 0.1) ``` ``` ## [1] 27.25 ``` --- # What .kjh-green[functions] usually do .pull-left.w80[ There are all kinds of functions. They return different things. ```r summary(my_numbers) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.00 1.75 3.00 6.25 6.25 25.00 ``` ] -- .pull-left.w80[You can assign the output of a function to a name, which turns it into an object. (Otherwise it'll send its output to the console.) ```r my_summary <- summary(my_numbers) my_summary ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.00 1.75 3.00 6.25 6.25 25.00 ``` ] --- # What .kjh-green[functions] usually do .pull-left.w80[Objects hang around in your work environment until they are overwritten by you, or are deleted. ```r ## rm() function removes objects rm(my_summary) my_summary ## Error: object 'my_summary' not found ``` ] --- # Functions can be .kjh-yellow[nested] .pull-left.w80[ ```r c(1:20) ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ``` ] -- .pull-left.w80[ ```r summary(c(1:20)) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.00 5.75 10.50 10.50 15.25 20.00 ``` ] .pull-left.w80[ ```r names(summary(c(1:20))) ``` ``` ## [1] "Min." "1st Qu." "Median" "Mean" "3rd Qu." "Max." ``` ] -- .pull-left.w80[ ```r length(names(summary(c(1:20)))) ``` ``` ## [1] 6 ``` ] -- .pull-left.w80[Nested functions are evaluated from the inside out.] --- # Use the pipe operator: .kjh-pink[**`|>`**] Instead of nesting functions in parentheses, we can use the .kjh-pink[_pipe operator_] to join them together: ```r c(1:20) |> summary() |> names() |> length() ``` ``` ## [1] 6 ``` -- Read this operator as "_.kjh-pink[**and then**]_" -- Better, vertical space is free in R: ```r c(1:20) |> summary() |> names() |> length() ``` ``` ## [1] 6 ``` --- # Pipelines make code more .kjh-green[readable] Not great, Bob: ```r serve(stir(pour_in_pan(whisk(crack_eggs(get_from_fridge(eggs), into = "bowl"), len = 40), temp = "med-high"))) ``` -- Notice how the first thing you read is the last operation performed. -- Really not much better: .medium[ ```r serve( stir( pour_in_pan( whisk( crack_eggs( get_from_fridge(eggs), into = "bowl"), len = 40), temp = "med-high") ) ) ``` ] --- # Pipelines make code more .kjh-green[readable] Much nicer: .medium[ ```r eggs |> get_from_fridge() |> crack_eggs(into = "bowl") |> whisk(len = 40) |> pour_in_pan(temp = "med-high") |> stir() |> serve() ``` ] -- .pull-left.w60[We'll still use nested parentheses quite a bit, often in the context of a function working inside a pipeline. But it's good not to have too many levels of nesting.] --- # The .kjh-yellow[magrittr] pipe: .kjh-pink[`%>%`] - The pipe operator .kjh-pink[**`|>`**] was not part of Base R until very recently. Pipling was originally introduced in a package called `magrittr`, where it was written .kjh-pink[**`%>%`**] and behaved in very nearly* the same way as the base pipe now does. -- - The magrittr pipe continues to work. A lot of existing code uses it (e.g., my book!). -- - _Sidenote:_ There are bunch of special operators in R that have the naming convention .kjh-pink[**`%something%`**]. For example .kjh-pink[**`%\*%`**] means "matrix multiply". We'll see more of them as we go. In this context the **`% %`** is sometimes pronounced "grapes". .footnote.tiny[.kjh-darkgrey[\*With the new pipe, you can only pass an object to the _first_ argument in a function. This is fine for most tidyverse pipelines, where the first argument is usually (implicitly) the data. But it does mean that most Base R functions will continue not to be easily piped, as most of them do not follow the convention of passing the current data as the first argument]] --- # Functions are bundled into .kjh-yellow[packages] -- All programming languages gain power and convenience by having libraries or packages of functions that extend the core abilities of the language. In R, packages are loaded into your working environment using the .kjh-green[**`library()`**] function. -- ```r ## A package containing a dataset rather than functions library(gapminder) gapminder ``` ``` ## # A tibble: 1,704 × 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1952 28.8 8425333 779. ## 2 Afghanistan Asia 1957 30.3 9240934 821. ## 3 Afghanistan Asia 1962 32.0 10267083 853. ## 4 Afghanistan Asia 1967 34.0 11537966 836. ## 5 Afghanistan Asia 1972 36.1 13079460 740. ## 6 Afghanistan Asia 1977 38.4 14880372 786. ## 7 Afghanistan Asia 1982 39.9 12881816 978. ## 8 Afghanistan Asia 1987 40.8 13867957 852. ## 9 Afghanistan Asia 1992 41.7 16317921 649. ## 10 Afghanistan Asia 1997 41.8 22227415 635. ## # … with 1,694 more rows ``` --- # Functions are bundled into .kjh-yellow[packages] -- .SMALL.squish2[You need only _install_ a package once (and occasionally update it). But you must _load_ the package in each R session before you can access its contents.] .SMALL[ ```r ## Do at least once for each package. Once done, not needed each time. install.packages("palmerpenguins", repos = "http://cran.rstudio.com") ## Needed sometimes, especially after an R major version upgrade. update.packages(repos = "http://cran.rstudio.com") ``` ] .SMALL[ ```r ## To load a package, usually at the start of your RMarkdown document or script file library(palmerpenguins) penguins ``` ``` ## # A tibble: 344 × 8 ## species island bill_length_mm bill_depth_mm flipper_…¹ body_…² sex year ## <fct> <fct> <dbl> <dbl> <int> <int> <fct> <int> ## 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007 ## 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007 ## 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007 ## 4 Adelie Torgersen NA NA NA NA <NA> 2007 ## 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007 ## 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007 ## 7 Adelie Torgersen 38.9 17.8 181 3625 fema… 2007 ## 8 Adelie Torgersen 39.2 19.6 195 4675 male 2007 ## 9 Adelie Torgersen 34.1 18.1 193 3475 <NA> 2007 ## 10 Adelie Torgersen 42 20.2 190 4250 <NA> 2007 ## # … with 334 more rows, and abbreviated variable names ¹flipper_length_mm, ## # ²body_mass_g ``` ] --- # Grabbing a single function with .kjh-green[**`::`**] .pull-left.w70["Reach in" to an unloaded package and grab a function directly, using .kjh-green[`<package>::<function>`]] -- .pull-left.w70[ .less-medium[ ```r ## A little glimpse of what we'll do soon penguins |> select(species, body_mass_g, sex) |> * gtsummary::tbl_summary(by = species) ```
Characteristic
Adelie
, N = 152
Chinstrap
, N = 68
Gentoo
, N = 124
body_mass_g, Median (IQR)
3,700 (3,350 – 4,000)
3,700 (3,488 – 3,950)
5,000 (4,700 – 5,500)
Unknown
1
0
1
sex, n (%)
female
73 (50)
34 (50)
58 (49)
male
73 (50)
34 (50)
61 (51)
Unknown
6
0
5
] ] --- # Remember this warning about conflicts?  Notice how some functions in different packages have the same names. -- Related concepts of _namespaces_ and _environments_. --- # Scope of names .small[ ```r x <- c(1:10) y <- c(90:100) x ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` ```r y ``` ``` ## [1] 90 91 92 93 94 95 96 97 98 99 100 ``` ] -- .small[ ```r mean() ## Error in mean.default() : argument "x" is missing, with no default ``` ] -- .small[ ```r mean(x) # argument names are internal to functions ``` ``` ## [1] 5.5 ``` ```r mean(x = x) ``` ``` ## [1] 5.5 ``` ```r mean(x = y) ``` ``` ## [1] 95 ``` ```r x ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` ```r y ``` ``` ## [1] 90 91 92 93 94 95 96 97 98 99 100 ``` ] --- # 5. Objects come in .kjh-yellow[types] and .kjh-yellow[classes] I'm going to speak somewhat loosely here for now, and gloss over some distinctions between object classes and data structures, as well as kinds of objects and their attributes. -- The object inspector in RStudio is your friend. -- You can ask an object what it is. ```r class(my_numbers) ``` ``` ## [1] "numeric" ``` ```r typeof(my_numbers) ``` ``` ## [1] "double" ``` --- # 5. Objects come in .kjh-yellow[types] and .kjh-yellow[classes] Objects can have more than one (nested) class: -- ```r summary(my_numbers) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.00 1.75 3.00 6.25 6.25 25.00 ``` ```r my_smry <- summary(my_numbers) # remember, outputs can be assigned to a name, creating an object class(summary(my_numbers)) # functions can be nested, and are evaluated from the inside out ``` ``` ## [1] "summaryDefault" "table" ``` ```r class(my_smry) # equivalent to the previous line ``` ``` ## [1] "summaryDefault" "table" ``` --- # 5. Objects come in .kjh-yellow[types] and .kjh-yellow[classes] ```r typeof(my_smry) ``` ``` ## [1] "double" ``` ```r attributes(my_smry) ``` ``` ## $names ## [1] "Min." "1st Qu." "Median" "Mean" "3rd Qu." "Max." ## ## $class ## [1] "summaryDefault" "table" ``` ```r ## In this case, the functions extract the corresponding attribute class(my_smry) ``` ``` ## [1] "summaryDefault" "table" ``` ```r names(my_smry) ``` ``` ## [1] "Min." "1st Qu." "Median" "Mean" "3rd Qu." "Max." ``` --- # A .kjh-green[vector] is a fundamental kind of object .pull-left.small.kjh-darkgrey[ [] - From Hadley Wickham, _Advanced R_ ] -- .pull-right[ ```r my_int <- c(1, 3, 5, 6, 10) is.integer(my_int) ``` ``` ## [1] FALSE ``` ```r is.double(my_int) ``` ``` ## [1] TRUE ``` ```r my_int <- as.integer(my_int) is.integer(my_int) ``` ``` ## [1] TRUE ``` ```r my_chr <- c("Mary", "had", "a", "little", "lamb") is.character(my_chr) ``` ``` ## [1] TRUE ``` ```r my_lgl <- c(TRUE, FALSE, TRUE) is.logical(my_lgl) ``` ``` ## [1] TRUE ``` ] --- # The most common types of .kjh-green[vector] .pull-left.tiny.kjh-darkgrey[ [] - From Hadley Wickham, _Advanced R_ ] -- .pull-right.tiny[ ```r ## Factors are for storing categorical variables x <- factor(c("Yes", "No", "No", "Maybe", "Yes", "Yes")) x ``` ``` ## [1] Yes No No Maybe Yes Yes ## Levels: Maybe No Yes ``` ```r summary(x) # Alphabetical order by default ``` ``` ## Maybe No Yes ## 1 2 3 ``` ```r typeof(x) # A factor is a vector of integers ``` ``` ## [1] "integer" ``` ```r attributes(x) # ... with labels for its "levels" ``` ``` ## $levels ## [1] "Maybe" "No" "Yes" ## ## $class ## [1] "factor" ``` ```r levels(x) ``` ``` ## [1] "Maybe" "No" "Yes" ``` ```r is.ordered(x) ``` ``` ## [1] FALSE ``` ] ??? HW: Categorical data, where values come from a fixed set of levels recorded in factor vectors. Dates (with day resolution), which are recorded in Date vectors. Date-times (with second or sub-second resolution), which are stored in POSIXct vectors. Durations, which are stored in difftime vectors. --- # Individual vectors can't be heterogenous Objects can be manually or automatically coerced from one class to another. Take care! -- ```r class(my_numbers) ``` ``` ## [1] "numeric" ``` ```r my_new_vector <- c(my_numbers, "Apple") my_new_vector # vectors are homogeneous/atomic ``` ``` ## [1] "1" "2" "3" "1" "3" "5" "25" "10" "Apple" ``` ```r class(my_new_vector) ``` ``` ## [1] "character" ``` -- ```r my_dbl <- c(2.1, 4.77, 30.111, 3.14519) is.double(my_dbl) ``` ``` ## [1] TRUE ``` ```r my_dbl <- as.integer(my_dbl) my_dbl ``` ``` ## [1] 2 4 30 3 ``` --- # Tibbles are a .kjh-yellow[list] of .kjh-green[vectors] of various .kjh-pink[types] .SMALL[ ```r gapminder # tibbles and data frames can contain vectors of different types ``` ``` ## # A tibble: 1,704 × 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1952 28.8 8425333 779. ## 2 Afghanistan Asia 1957 30.3 9240934 821. ## 3 Afghanistan Asia 1962 32.0 10267083 853. ## 4 Afghanistan Asia 1967 34.0 11537966 836. ## 5 Afghanistan Asia 1972 36.1 13079460 740. ## 6 Afghanistan Asia 1977 38.4 14880372 786. ## 7 Afghanistan Asia 1982 39.9 12881816 978. ## 8 Afghanistan Asia 1987 40.8 13867957 852. ## 9 Afghanistan Asia 1992 41.7 16317921 649. ## 10 Afghanistan Asia 1997 41.8 22227415 635. ## # … with 1,694 more rows ``` ```r class(gapminder) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` ```r typeof(gapminder) # hmm ``` ``` ## [1] "list" ``` ] Underneath, most complex R objects are some kind of list with different components. ??? - A _data frame_ is a list of vectors of the same length, where the vectors can be of different types (e.g. numeric, character, logical, etc) - A _tibble_ is an enhanced data frame Tibbles have an enhanced print method, never coerce strings to factors, and provide stricter subsetting methods. (HW) Again the object inspector is helpful here --- # Classes can be nested Some classes build on and enhance the properties of simpler classes. .pull-left[ Base R's trusty .kjh-lblue[**`data.frame`**] ```r library(socviz) titanic ``` ``` ## fate sex n percent ## 1 perished male 1364 62.0 ## 2 perished female 126 5.7 ## 3 survived male 367 16.7 ## 4 survived female 344 15.6 ``` ```r class(titanic) ``` ``` ## [1] "data.frame" ``` ```r ## The `$` idiom picks out a named column here; ## more generally, the named element of a list titanic$percent ``` ``` ## [1] 62.0 5.7 16.7 15.6 ``` ] -- .pull-right[ The Tidyverse's enhanced .kjh-lblue[**`tibble`**] ```r ## tibbles are build on data frames class(titanic) ``` ``` ## [1] "data.frame" ``` ```r titanic_tb <- as_tibble(titanic) titanic_tb ``` ``` ## # A tibble: 4 × 4 ## fate sex n percent ## <fct> <fct> <dbl> <dbl> ## 1 perished male 1364 62 ## 2 perished female 126 5.7 ## 3 survived male 367 16.7 ## 4 survived female 344 15.6 ``` ```r class(titanic_tb) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` ] --- # All of this will matter later on ```r gss_sm ``` ``` ## # A tibble: 2,867 × 32 ## year id ballot age childs sibs degree race sex region incom…¹ relig ## <dbl> <dbl> <labe> <dbl> <dbl> <lab> <fct> <fct> <fct> <fct> <fct> <fct> ## 1 2016 1 1 47 3 2 Bache… White Male New E… $17000… None ## 2 2016 2 2 61 0 3 High … White Male New E… $50000… None ## 3 2016 3 3 72 2 3 Bache… White Male New E… $75000… Cath… ## 4 2016 4 1 43 4 3 High … White Fema… New E… $17000… Cath… ## 5 2016 5 3 55 2 2 Gradu… White Fema… New E… $17000… None ## 6 2016 6 2 53 2 2 Junio… White Fema… New E… $60000… None ## 7 2016 7 1 50 2 2 High … White Male New E… $17000… None ## 8 2016 8 3 23 3 6 High … Other Fema… Middl… $30000… Cath… ## 9 2016 9 1 45 3 5 High … Black Male Middl… $60000… Prot… ## 10 2016 10 3 71 4 1 Junio… White Male Middl… $60000… None ## # … with 2,857 more rows, 20 more variables: marital <fct>, padeg <fct>, ## # madeg <fct>, partyid <fct>, polviews <fct>, happy <fct>, partners <fct>, ## # grass <fct>, zodiac <fct>, pres12 <labelled>, wtssall <dbl>, ## # income_rc <fct>, agegrp <fct>, ageq <fct>, siblings <fct>, kids <fct>, ## # religion <fct>, bigregion <fct>, partners_rc <fct>, obama <dbl>, and ## # abbreviated variable name ¹income16 ``` -- .pull-left.w80.squish2[Tidyverse tools are generally _type safe_, meaning their functions return the same type of thing every time, or fail if they cannot do this. So it's good to know about the various data types.] --- # 6. .kjh-yellow[Arithmetic on vectors] .pull-left.w60[In R, all numbers are vectors of different sorts. Even single numbers ("scalars") are conceptually vectors of length 1.] -- .pull-left.w60[Arithmetic on vectors\* follows a series of _recycling rules_ that favor ease of expression of vectorized, "elementwise" operations.] .pull-left.w60.footnote.small[*And arrays, too.] --- # 6. .kjh-yellow[Arithmetic on vectors] See if you can predict what the following operations do: ```r my_numbers ``` ``` ## [1] 1 2 3 1 3 5 25 10 ``` ```r result1 <- my_numbers + 1 ``` -- ```r result1 ``` ``` ## [1] 2 3 4 2 4 6 26 11 ``` -- ```r result2 <- my_numbers + my_numbers ``` -- ```r result2 ``` ``` ## [1] 2 4 6 2 6 10 50 20 ``` -- ```r two_nums <- c(5, 10) result3 <- my_numbers + two_nums ``` -- ```r result3 ``` ``` ## [1] 6 12 8 11 8 15 30 20 ``` --- # 6. .kjh-yellow[Arithmetic on vectors] ```r three_nums <- c(1, 5, 10) result4 <- my_numbers + three_nums ``` ``` ## Warning in my_numbers + three_nums: longer object length is not a multiple of ## shorter object length ``` -- ```r result4 ``` ``` ## [1] 2 7 13 2 8 15 26 15 ``` Note that you got a **warning** here. R will still do what you told it do, though! Don't ignore warnings until you understand what they mean. --- # 7. .kjh-yellow[R will be] .kjh-red[frustrating] -- - The IDE tries its best to help you. Learn to attend to what it is trying to say. .left[] -- .left[] -- .left[] --- class: center middle main-title section-title-1 # .huge.kjh-lblue[Let's Go!] --- # Time to make a plot (again) Like before: ```r gapminder ``` ``` ## # A tibble: 1,704 × 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1952 28.8 8425333 779. ## 2 Afghanistan Asia 1957 30.3 9240934 821. ## 3 Afghanistan Asia 1962 32.0 10267083 853. ## 4 Afghanistan Asia 1967 34.0 11537966 836. ## 5 Afghanistan Asia 1972 36.1 13079460 740. ## 6 Afghanistan Asia 1977 38.4 14880372 786. ## 7 Afghanistan Asia 1982 39.9 12881816 978. ## 8 Afghanistan Asia 1987 40.8 13867957 852. ## 9 Afghanistan Asia 1992 41.7 16317921 649. ## 10 Afghanistan Asia 1997 41.8 22227415 635. ## # … with 1,694 more rows ``` --- # Like before .pull-left.w35[ ```r library(tidyverse) library(gapminder) p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) p + geom_point() ``` ] -- .pull-right.w60[ <img src="02-slides_files/figure-html/codefig-plot-2-1.png" width="556" style="display: block; margin: auto;" /> ] --- # What we did .pull-left.w40[ ```r library(tidyverse) library(gapminder) ``` ] .pull-right.w60[ - Load the packages we need: `tidyverse` and `gapminder` ] -- .pull-left.w40[ ```r p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) ``` ] .pull-right.w60[ - New object named .kjh-lblue[**`p`**] .kjh-pink[`gets`] the output of the .kjh-green[`ggplot()` _function_], given these .kjh-orange[_arguments_] - Notice how one of the arguments, .kjh-orange[`mapping`], is itself taking the output of a function named .kjh-green[`aes()`] ] -- .pull-left.w40[ ```r p + geom_point() ``` ] .pull-right.w60[ - Show me the output of the .kjh-lblue[**`p`**] object and the .kjh-green[`geom_point()`] function. - The .kjh-pink[`+`] here acts just like the .kjh-pink[`|>`] pipe, but for ggplot functions only. (This is an accident of history.) ] --- # And what is R doing? - .huge[R objects are just lists of .kjh-orange[stuff to use] or .kjh-green[things to do]] --- layout: false class: bottom background-image: url("img/02_r_object_bento_box.png") background-size: cover ## .huge.right.bottom.squish4.kjh-grey[Objects are like Bento Boxes] --- layout: false .center[] .right.w90.huge[The .kjh-lblue[`p`] object] --- layout: false .center[] .right.w90.huge[Peek in with the object inspector] --- layout: false .center[] .right.w90.huge[Peek in with the object inspector]