07 — Social Data, Social Categories, and the State

Kieran Healy

February 21, 2024

Data and
the State

Load our libraries

library(here)       # manage file paths
library(socviz)     # data and some useful functions
library(tidyverse)  # your friend and mine
library(tidycensus) # Tidily interact with the US Census

Problem Set review

Grouping

library(palmerpenguins)
penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

Grouping

  • Always ask the question “What is a row in this table?”
  • Always ask the question “What do I want a row to be in the table I make”

Grouping

  • Always ask the question “What is a row in this table?”
  • Always ask the question “What do I want a row to be in the table I make”
penguins |> 
  group_by(island) |> 
  summarize(mean_bl_by_island = mean(bill_length_mm, na.rm = TRUE))
# A tibble: 3 × 2
  island    mean_bl_by_island
  <fct>                 <dbl>
1 Biscoe                 45.3
2 Dream                  44.2
3 Torgersen              39.0

Grouping

  • Always ask the question “What is a row in this table?”
  • Always ask the question “What do I want a row to be in the table I make”
penguins |> 
  group_by(species) |> 
  summarize(mean_bl_by_species = mean(bill_length_mm, na.rm = TRUE))
# A tibble: 3 × 2
  species   mean_bl_by_species
  <fct>                  <dbl>
1 Adelie                  38.8
2 Chinstrap               48.8
3 Gentoo                  47.5

Grouping

  • Always ask the question “What is a row in this table?”
  • Always ask the question “What do I want a row to be in the table I make”
penguins |> 
  group_by(island) |> 
  mutate(mean_bl_by_island = mean(bill_length_mm, na.rm = TRUE)) 
# A tibble: 344 × 9
# Groups:   island [3]
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 3 more variables: sex <fct>, year <int>, mean_bl_by_island <dbl>

Grouping and ranking

  • Always ask the question “What is a row in this table?”
  • Always ask the question “What do I want a row to be in the table I make”
library(nycdogs)
nyc_license |> 
  group_by(borough, animal_name) |> 
  summarize(n_dogs = n()) |> 
  slice_max(n_dogs, n = 5)
# A tibble: 30 × 3
# Groups:   borough [6]
   borough  animal_name       n_dogs
   <chr>    <chr>              <int>
 1 Bronx    Bella                777
 2 Bronx    Max                  688
 3 Bronx    Rocky                504
 4 Bronx    Princess             499
 5 Bronx    Coco                 481
 6 Brooklyn Unknown             3417
 7 Brooklyn Name                1494
 8 Brooklyn Bella               1335
 9 Brooklyn Max                 1200
10 Brooklyn Name Not Provided   1074
# ℹ 20 more rows

Groups and relationships

penguins |> 
  ggplot(mapping = aes(x = bill_length_mm, y = bill_depth_mm)) + 
  geom_point() + 
  geom_smooth(method = "lm", color = "black", se = FALSE)

Groups and relationships

penguins |> 
  ggplot(mapping = aes(x = bill_length_mm, y = bill_depth_mm)) + 
  geom_point(mapping = aes(color = species)) + 
  geom_smooth(method = "lm", color = "black", se = FALSE)

Groups and relationships

penguins |> 
  ggplot(mapping = aes(x = bill_length_mm, y = bill_depth_mm)) + 
  geom_point(mapping = aes(color = species)) + 
  geom_smooth(method = "lm", color = "black", se = FALSE) 

Groups and relationships

penguins |> 
  ggplot(mapping = aes(x = bill_length_mm, y = bill_depth_mm)) + 
  geom_point(mapping = aes(color = species)) + 
  geom_smooth(method = "lm", color = "black", se = FALSE) + 
  geom_smooth(mapping = aes(color = species, fill = species), 
              method = "lm")

Simpson’s Paradox

  • Aggregate trends or relationships between two variables, appear to reverse when broken out by category

  • Alternatively, a trend visible in various groups disappears or reverses when the groups are aggregated

What is data?

What is data?

A Trace

A Record

An Account

What is data?

A Story

A Memory

A Promise

What is data?

A Story

A Memory

A Promise

What is data?

An Action

A Device

A Resource

What is data

What is data

Example: The U.S. Census

The U.S. Census

1790 Census record, North Carolina

The U.S. Census

1790

  • • Number of free white males aged under 16 years
  • • Number of free white males aged 16 years and upward
  • • Number of free white females
  • • Number of other free persons
  • • Number of slaves

The U.S. Census

1790

  • • Number of free white males aged under 16 years
  • • Number of free white males aged 16 years and upward
  • • Number of free white females
  • • Number of other free persons
  • • Number of slaves

1820

  • • The number of free White males and females
  • • The number of male and female slaves
  • • The number of free colored males and females
  • • Number of foreigners not naturalized

The U.S. Census

1830

  • • The number of slaves and free colored persons of each sex
  • • Number of foreigners not naturalized

The U.S. Census

1830

  • • The number of slaves and free colored persons of each sex
  • • Number of foreigners not naturalized

1850

  • • Free Inhabitants Questionnaire
  • • Slave Inhabitants Questionnaire
  • • Individual enslaved people listed by owner and assigned a number; names not recorded

The U.S. Census

1860

  • • “Color” Question, recorded as White, Black, Mulatto, Chinese, Indian

The U.S. Census

1860

  • • “Color” Question, recorded as White, Black, Mulatto, Chinese, Indian

1890

  • • “Race”, recorded as White, Black, Mulatto, Quadroon, Octoroon, Chinese, Japanese, or Indian.

The U.S. Census

1900

  • • “Color or Race”, recorded as White, Black, Chinese, Japanese, Indian

The U.S. Census

1900

  • • “Color or Race”, recorded as White, Black, Chinese, Japanese, Indian

1910

  • • White, Black, Mulatto, Chinese, Japanese, Indian, Other

The U.S. Census

1930

  • “Mexican” a racial category

“Mexican” category

The U.S. Census

1930

  • “Mexican” a racial category

“Mexican” category

1940

1940 race questions

The U.S. Census

1970

The U.S. Census

1980

  • Race and Ethnicity

The U.S. Census

1990

The U.S. Census

2000

The U.S. Census

2010

:scale 80%

The U.S. Census

2020

Social Classification

Categories and Classes

American Beef Cuts

Categories and Classes

French Beef Cuts

Categories and Classes

Categories and Classes

Categories and Classes

Categories and Classes

ICD Codes

ICD Codes

ICD Codes

ICD Codes

ICD Codes

ICD Codes

CDC WONDER