This class will do two things. First, it will teach you how to use modern, widely-used tools to create insightful, beautiful, reproducible visualizations of social science data. Second, you will also learn about the theory and practice of efforts to visualize sociological data, and society more generally. We will think about different ways of looking at social science data, about where data comes from in the first place, and about the implications of choosing to represent it in different ways.
By the end of the course you will
- Understand the basic principles behind effective data visualization.
- Have a practical sense for why some graphs and figures work well, while others may fail to inform or actively mislead.
- Know how to create a wide range of plots in R using ggplot2.
- Know how to refine plots for effective presentation.
- Have an understanding of some issues surrounding the collection and representation of data in the social sciences and beyond.
I strongly recommend you buy two books:
- Kieran Healy, Data Visualization: A Practical Introduction (Princeton: Princeton University Press, 2019), http://socviz.co/. [Draft version free online; print version at Amazon or other bookshops.]
- Hadley Wickham and Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (Sebastopol, California: O’Reilly Media, 2017), http://r4ds.had.co.nz/. [Free online; print version at Amazon or other bookshops.]
You should also consider buying this one:
- Claus E. Wilke, Fundamentals of Data Visualization (Sebastopol, California: O’Reilly Media, 2019), https://serialmentor.com/dataviz/. [Draft version free online; print version at Amazon or other bookshops.]
We will also read material from the following books, amongst other sources:
- Whitney Battle-Baptiste and Britt Rusert, W. E. B. Du Bois’s Data Portraits: Visualizing Black America (New York: Princeton Architectural Press, 2018).
- John Berger, Ways of Seeing (London: BBC Books / Penguin, 1972).
- Scott Berinato, Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations (Cambridge, MA: Harvard Business Review Press, 2016).
- Jacques Bertin, Semiology of Graphics (Redlands, CA: ESRI Press, 2010).
- Alberto Cairo, The Truthful Art: Data, Charts, and Maps for Communication (Berkeley, California: New Riders, 2016).
- William S. Cleveland, Visualizing Data (Hobart Press, 1994).
- Kenneth Field, Cartography (Redlands, CA: ESRI Press, 2018).
- Stephen Few, Now You See It: Simple Visualization Techniques for Quantitative Analysis (Oakland, CA: Analytics Press, 2009).
- Manuel Lima, The Book of Trees (New York: Princeton Architectural Press, 2014).
- Ellen Lupton, Thinking with Type: A Critical Guide for Designers, Writers, Editors, & Students, Second. (New York: Princeton Architectural Press, 2010).
- Tamara Munzer, Visualization Analysis and Design, AK Peters Visualization Series (Boca Raton, FL: CRC Press, 2014).
- Edward R. Tufte, The Visual Display of Quantitative Information (Cheshire, CT: Graphics Press, 1983).
- Colin Ware, Visual Thinking for Design (Waltham, MA: Morgan Kaufman, 2008).
- Nathan Yau, Visualize This: The Flowingdata Guide to Design, Visualization, and Statistics (New York: Wiley, 2011).
We will do all of our visualization work in this class using the programming language R. We will use RStudio to manage our code and projects. R and R Studio are widely used tools for data analysis in academia and industry.
You will need to install some software first. Here is what to do:
Get the most recent version of R. R is free and available for Windows, Mac, and Linux operating systems. Download
the version of R compatible with your operating system. If you are running Windows or MacOS, you should choose one of the precompiled binary distributions (i.e., ready-to-run applications) linked at the top of the R Project’s webpage.
Once R is installed, download and install R Studio.
R Studio is an “Integrated Development Environment”, or IDE. This means it is a front-end for R that makes it much easier to work with. R Studio is also free, and available for Windows, Mac, and Linux platforms.
the tidyverse library and several other add-on packages for R. These libraries provide useful functionality that we will take advantage of throughout the book. You can learn more about the tidyverse’s family of packages at its website.
To install the tidyverse and some additional useful packages, make sure you have an Internet connection and then launch R Studio. Type the following lines of code at R’s command prompt, located in the window named “Console”, and hit return. In the code below, the
<-arrow is made up of two keystrokes, first
<and then the short dash or minus symbol,
my_packages <- c("tidyverse", "broom", "coefplot", "cowplot", "gapminder", "GGally", "ggraph", "ggrepel", "ggridges", "gridExtra", "here", "maps", "mapproj", "mapdata", "MASS", "quantreg", "rlang", "scales", "survey", "srvyr", "usethis", "devtools") install.packages(my_packages, repos = "http://cran.rstudio.com")
R Studio should then download and install these packages for you. It may take a little while to download everything.
With these packages available, you can then install one last package that’s useful specifically for this course.
|August 21st, 23rd||Orientation and Setup|
|August 28th / 30th||Ways of Seeing / Expressing Yourself with R|
|September 4th / 6th||Looking at Data / Working Tidily with Data|
|September 11th / 13th||Making Graphs / Showing the Right Numbers|
|September 18th / 20th||(No class this week)|
|September 25th / 27th||Groups, Kinds, and Comparisons / Working with Tables|
|October 2nd / 4th||Time, Trends, and Flows / Expanding your Visual Vocabulary|
|October 9th / 11th||Populations and Distributions / Beeswarms, Pyramids, and Lexis Surfaces|
|October 16th / 18th||Trees, Ties, Relations / Visualizing Networks and Hierarchies|
|October 23rd / 25th||Midterm Workshop / Midterm Presentations|
|October 30th / November 1st||Space and Place / Maps I|
|November 6th / 8th||Nations and States / Maps II|
|November 13th / 15th||Design Thinking / Refining Plots|
|November 20th / 22nd||Representing and Intervening|
|November 27th / 29th||Thanksgiving (No Class)|
|December 4th / 6th||Final Projects|
As the weeks go by, consult the Schedule Page for more information on weekly topics, problem sets, readings, and other materials. The schedule is likely to change as we go. Links to readings, assignments, and other materials from class will be posted on that page.
- Attendance is required. I am a reasonable person; if you need to be absent please let me know in advance insofar as that is possible.
- Do the assigned readings in advance of class.
- Submit memos, problem sets, or other assigned work on the day they are due.
- You will need to bring a laptop to class and use it for note-taking and in-class work.
- Because it is hard even for me to compete with the interconnected entirety of human knowledge, friendship, entertainment, and retailing options available online, I may occasionally ask you to close the laptop.
- If I find you’re engaging in non-class-related laptop use that is distracting to me or other students, I will warn you about it once. After that I’ll penalize your grade.
Required Work and Grading
There are three kinds of assignments for the course: memos, problem sets, and projects.
- Each week you will write one short Reflection Memo (250 to 500 words) on the reading. Together these will account for 25% of your grade. Memos are due on the Tuesday of each week.
- Problem Sets will let you practice your visualization skills. Five problem sets will together be worth 25% of your grade. Problem sets are due on Fridays the week after they are given out.
- A Midterm project will let you test out your data visualization skills. I will assign the dataset and set the parameters of the project. The project is due on Friday, October 24th. It will be worth 25% of your grade.
- A Final Project will allow you to develop a more substantial visualization or series of visualizations. I will suggest several possible datasets, or you can propose your own. The project is due on Friday, December 6th. It will be worth 25% of your grade.
There is no final exam for the class.
Duke Community Standard
Like all classes at the university, this course is conducted under the Duke Community Standard. Duke University is a community dedicated to scholarship, leadership, and service and to the principles of honesty, fairness, respect, and accountability. Citizens of this community commit to reflect upon and uphold these principles in all academic and nonacademic endeavors, and to protect and promote a culture of integrity. To uphold the Duke Community Standard you will not lie, cheat, or steal in academic endeavors; you will conduct yourself honorably in all your endeavors; and you will act if the Standard is compromised.