library(tidyverse)
January 23, 2024
Touch-based user interface
Foregrounds a single application
Dislikes multi-tasking
Hides the file system
“Laundry Pile” user model of where things are stored
Windows and pointers.
Multi-tasking, multiple windows.
Exposes and leverages the file system.
Many specialized tools in concert.
Underneath, it’s the 1970s, UNIX, and the command-line.
Cabinets, drawers, and files model of where things are stored
This toolset is by now really good!
Free! Open! Powerful!
Friendly communities! Lots of information! Many resources!
But: grounded in a UI paradigm that is increasingly far away from the everyday use of computing devices
So why do we use this stuff?
Everyone knows Word, Excel, or Google Docs.
“Track changes” is powerful and easy.
Hm, I can’t remember how I made this figure
Where did this table of results come from?
Paper_edits_FINAL_kh-1.docx
Plain text is highly portable.
Push button, recreate analysis.
JFC Why can’t I do this simple thing?
Object of type 'closure' is not subsettable
Each approach generates solutions to its own problems
The problem is, you probably have never have actually used one of these!
Your computer stores files and does stuff, or “runs commands”
Files are stored in a large hierarchy of folders
The Finder or Window Manager or File Manager is a visual metaphor for representing this hierarchy of files and for running commands on them. But you can also do these things via text-based commands delivered from a prompt, console, or “command line”.
Software like RStudio has a lot of these “old school” computing elements
We want to draw graphs reproducibly
Easy things are awkward
Hard things are straightforward
Really hard things are possible
Easy things are trivial
Hard things are awkward
Really hard things are impossible
Think in terms of Data + Transformations, written out as code, rather than a series of point-and-click steps
Our starting data + our code is what’s “real” in our projects, not the final output or any intermediate objects
Desired style | Use the following Markdown annotation |
---|---|
Heading 1 | # Heading 1 |
Heading 2 | ## Heading 2 |
Heading 3 | ### Heading 3 (Actual heading styles will vary.) |
Paragraph | Just start typing |
Bold | **Bold** |
Italic | *Italic* |
Images | [Alternate text for image](path/image.jpg) |
Hyperlinks | [Link text](https://www.visualizingsociety.com/) |
Unordered Lists | |
- First | - First |
- Second. | - Second |
- Third | - Third |
Ordered Lists | |
1. First | 1. First |
2. Second. | 2. Second |
3. Third | 3. Third |
Footnote.¹ | Footnote[^notelabel] |
¹The note’s content. | [^notelabel] The note's content. |
TYPE OUT
YOUR CODE
BY HAND
GETTING ORIENTED
library
(tidyverse)
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
<|
Draw graphs<|
Nicer data tables<|
Tidy your data<|
Get data into R<|
Fancy Iteration<|
Action verbs for tablesOutput:
This is equivalent to running the code above, typing my_numbers
at the console, and hitting enter.
By convention, code output in documents is prefixed by ##
Also by convention, outputting vectors, etc, gets a counter keeping track of the number of elements. For example,
Logical equality and inequality (yielding a TRUE
or FALSE
result) is done with ==
and !=
. Other logical operators include <
, >
, <=
, >=
, and !
for negation.
Or it’s a really bad idea to try to use them
There are a few built-in objects:
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
In fact, this is mostly what we will be doing.
Objects are created by assigning a thing to a name:
The c()
function combines or concatenates things
option
and -
on a Macalt
and -
on Windows=
=
as well as <-
for assignment.=
has a different meaning when used in functions.<-
for assignment throughout.=
my_numbers)=
my_numbers)If you don’t name the arguments, R assumes you are providing them in the order the function expects.
What arguments? Which order? Read the function’s help page
[1] NA
[1] 32.44444
Or select from one of several options
There are all kinds of functions. They return different things.
You can assign the output of a function to a name, which turns it into an object. (Otherwise it’ll send its output to the console.)
Objects hang around in your work environment until they are overwritten by you, or are deleted.
Nested functions are evaluated from the inside out.
Instead of deeply nesting functions in parentheses, we can use the pipe operator:
Read this operator as “and then”
Better, vertical space is free in R:
Not great, Bob:
Notice how the first thing you read is the last operation performed.
We can use vertical space and indents, but it’s really not much better:
Much nicer:
eggs |>
get_from_fridge() |>
crack_eggs(into = "bowl") |>
whisk(len = 40) |>
pour_in_pan(temp = "med-high") |>
stir() |>
serve()
%>%
The Base R pipe operator, |>
is a relatively recent addition to R.
Piping operations were originally introduced in a package called called magrittr
, where it took the form %>%
%>%
The Base R pipe operator, |>
is a relatively recent addition to R.
Piping operations were originally introduced in a package called called magrittr
, where it took the form %>%
It’s been so successful, a version of it has been incorporated into Base R. For our puposes, they’re the same.
Packages are loaded into your working environment using the library()
function:
# A tibble: 1,704 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
7 Afghanistan Asia 1982 39.9 12881816 978.
8 Afghanistan Asia 1987 40.8 13867957 852.
9 Afghanistan Asia 1992 41.7 16317921 649.
10 Afghanistan Asia 1997 41.8 22227415 635.
# ℹ 1,694 more rows
You need only install a package once (and occasionally update it):
But you must load the package in each R session before you can access its contents:
## To load a package, usually at the start of your RMarkdown document or script file
library(palmerpenguins)
penguins
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
Load the packages we need: tidyverse
and gapminder
New object named p
gets
the output of the ggplot()
function, given these arguments
Notice how one of the arguments, mapping
, is itself taking the output of a function named aes()
Show me the output of the p
object and the geom_point()
function.
The +
here acts just like the |>
pipe, but for ggplot functions only. (This is an accident of history.)
R objects are just lists of stuff to use or things to do
The core idea, which we’ll focus on more formally next week, is that we have data, arranged in columns, that we want to represent visually on some sort of plot.
That means we need a mapping — a link, a connection, a representation — between things in our table and stuff we can draw. That is what the mapping argument is for.
And we need a geom — a kind of plot, a particular sort of graph — that we draw with that.
Let’s try some live examples …How might we improve or extend this graph based on the data we have? Or how might we look at it differently?
```