Example 02: Writing R Code

This week we are going to jump into writing R code. I encourage you to experiment and try things. As we go, we will develop a good working understanding of how R works. But the best way to make this happen is to just start doing things and talk about them as we go.

When starting an R work session, we typically load the packages we will need. This is like taking a book off your shelf to refer to. We only need to do this once per session.

Code
library(tidyverse)

For now, don’t worry about any messages or warnings you get. But read them and think about what they are trying to tell you.

R basics

How do we write in R? How do we make it do things?

To start, we can say: in R, everything has a name and everything is an object. You do things to named objects with functions (which are themselves objects!). And you create an object by assigning a thing to a name.

Assignment is the act of attaching a thing to a name. It is represented by <- or = and you can read it as “gets” or “is”. Type it by with the < and then the - key. Better, there is a shortcut: on Mac OS it is Option - or Option and the - (minus or hyphen) key together. On Windows it’s Alt -.

Objects

We’re going to use the c() function (c for concatenate) to stick some numbers together into a vector. And we will assign that the name my_numbers.

Code
## Inside code chunks, lines beginning with a # character are comments
## Comments are ignored by R

my_numbers <- c(1, 1, 2, 4, 1, 3, 1, 5) # Anything after a # character is ignored as well

## Now we have an object by this name
my_numbers 
[1] 1 1 2 4 1 3 1 5

Again, in that previous chunk we created an object by assigning something (the result of a function) to a name. Now that thing exists in our project environment.

Code
my_numbers
[1] 1 1 2 4 1 3 1 5

R has a few built-in objects.

Code
letters
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
Code
LETTERS
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
Code
pi
[1] 3.141593

But mostly we will be creating objects.

R is a calculator

You don’t have to make objects. You can just treat R like a calculator that spits out answers at the console.

Code
(31 * 12) / 2^4
[1] 23.25
Code
sqrt(25)
[1] 5
Code
log(100)
[1] 4.60517
Code
log10(100)
[1] 2

The commands that look like this() are called functions.

But everything you do along these lines can, if you want, be assigned to a name. Like my_five <- sqrt(25).

You can do logic

Code
4 < 10
[1] TRUE
Code
4 > 2 & 1 > 0.5 # The "&" means "and"
[1] TRUE
Code
4 < 2 | 1 > 0.5 # The "|" means "or"
[1] TRUE
Code
4 < 2 | 1 < 0.5
[1] FALSE

A logical test:

Code
2 == 2 # Write `=` twice
[1] TRUE

Not this:

Code
## This will cause an error, because R will think you are trying to assign a value
2 = 2

## Error in 2 = 2 : invalid (do_set) left-hand side to assignment

Testing for “not equal to” or “is not”:

Code
3 != 7 # Write `!` and then `=` to make `!=`
[1] TRUE

More about objects

Code
my_numbers # We created this a few minutes ago
[1] 1 1 2 4 1 3 1 5
Code
letters  # This one is built-in
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
Code
pi  # Also built-in
[1] 3.141593

Creating objects: assign a thing (usually the result of a function) to a name.

Code
## this object... gets ... the output of this function
my_numbers <- c(1, 2, 3, 1, 3, 5, 25, 10)

your_numbers <- c(5, 31, 71, 1, 3, 21, 6, 52)

You do things with functions

Functions usually take input, perform actions, and then return output.

Code
# Calculate the mean of my_numbers with the mean() function
mean(x = my_numbers)
[1] 6.25

The instructions you can give a function are its arguments. Here, x is saying “this is the thing I want you to take the mean of”.

If you provide arguments in the “right” order (the order the function expects), you don’t have to name them.

Code
mean(my_numbers)
[1] 6.25

Look at the help for mean() with ?mean to learn what trim is doing.

Code
## The sample() function 
x <- sample(x = 1:100, size = 100, replace = TRUE) # What does each piece do here?
mean(x)
[1] 47.38
Code
mean(x, trim = 0.1) 
[1] 46.825

For functions with more than one or two arguments, explicitly naming arguments is good practice, especially when learning the language.

Data

A few datasets come built-in, for convenience. Here is one:

Code
mpg
# A tibble: 234 × 11
   manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
   <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
 1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
 2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
 3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
 4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
 5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
 6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
 7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
 8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
 9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…
# ℹ 224 more rows

Graphs

To draw a graph in ggplot requires two kinds of statements: one saying what the data is and what relationship we want to plot, and the second saying what kind of plot we want.

The first one is done by the ggplot() function.

Code
ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy))

You can see that by itself it doesn’t do anything.

But if we add a function saying what kind of plot, we get a result:

Code
ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy)) +
  geom_point()

At this stage, a lot of this may seem obscure. What is a mapping? What is this aes() thing? Why do we “add” the two things together? Don’t worry about it for now. We will go through this soon enough.

In the mean time, let’s keep messing about:

Code
# The gapminder data
library(gapminder)
gapminder
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

A few graphs. Look at these and, even if things aren’t clear in detail just yet, think about how the code is related to what you see.

A

Code
ggplot(data = gapminder, 
       mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point()

B

Code
ggplot(data = gapminder, 
       mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_smooth()
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

C

Code
ggplot(data = gapminder, 
       mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point() + 
  geom_smooth()
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

D

Code
ggplot(data = gapminder, 
       mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point() + 
  facet_wrap(~ continent)

E

Code
ggplot(data = gapminder, 
       mapping = aes(x = lifeExp)) +
  geom_histogram() 
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

F

Code
ggplot(data = gapminder, 
       mapping = aes(x = lifeExp)) +
  geom_histogram() +
  facet_wrap(~ continent)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.