Intro to R

Fall 2023 Intro to R workshop.

Bonus: A short intro to ggplot2

Joy Nyaanga / 2023-10-26


On day 3 we spent time discussing the {tidyverse} - a collection of R packages that share a common design philosophy, grammar, and structure. {ggplot2} is a powerful package within the tidyverse for producing elegant graphics piece by piece.

The gg in ggplot2 means Grammar of Graphics, a graphic concept which describes plots by using a “grammar”. Similar to how sentences are composed of various parts of speech (e.g., nouns, verbs, adjectives), {ggplot2} allows us to create figures using a standard syntax.

According to this concept, a plot can be divided into fundamental parts, mainly: Plot = data + aesthetics + geometry. We will take a look at each of these parts and see how together they build a plot.

Designating data

Like any R package, in order to use {ggplot2} functions, we must first load the library.

library(ggplot2)

Now we can begin to use functions within the {ggplot2} package. Step one is to tell R to set up a plot. We do this by calling ggplot2::ggplot(). This will open up a blank plot to the right of your RStudio session under the Plots tab.

ggplot2::ggplot()

Notice that the function we use to create a plot is ggplot whereas the package is called ggplot2

Now that we have the framework for a plot set, we can specify which data set we are interested in visualizing. Let’s use the palmerpenguins data set once more.

library(palmerpenguins)
penguins <- palmerpenguins::penguins
head(penguins)
## # A tibble: 6 × 8
##   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
## 1 Adelie  Torgersen           39.1          18.7               181        3750
## 2 Adelie  Torgersen           39.5          17.4               186        3800
## 3 Adelie  Torgersen           40.3          18                 195        3250
## 4 Adelie  Torgersen           NA            NA                  NA          NA
## 5 Adelie  Torgersen           36.7          19.3               193        3450
## 6 Adelie  Torgersen           39.3          20.6               190        3650
## # ℹ 2 more variables: sex <fct>, year <int>

The first element in data visualization is your data. Lets tell R that we want to plot the data help in the variable penguins.

ggplot2::ggplot(penguins)

I will not actually execute this block, but give it a try yourself. Notice that nothing was added to our blank plot! This is because we must also add the second element of data visualization: aesthetics.

Assigning aesthetics

Aesthetics include things like which variable goes on the X axis, which variable goes on the Y axis, and what size, shape, or color you want your points/lines/bars to be. Let’s explore the relationship between bill length (x-axis) and bill depth (y-axis) in our penguins data set.

Note: To build layers with ggplot, we use the +

ggplot2::ggplot(penguins) + aes(x = bill_length_mm, y = bill_depth_mm)

Notice we now have axes labels! But nothing has shown up in the actual bounds of our plot because we are still missing our geometry.

Adding geometric objects

The final element we need to add is a geometric object. Geometric objects define the type of plot we want to make. Let’s try plotting simple points. We will use geom_point().

ggplot2::ggplot(penguins) + aes(x = bill_length_mm, y = bill_depth_mm) + geom_point()

And now we can see our data!

There are several plot types we can create using the {ggplot2} package. Here are a few key types:

PLOT TYPES GGPLOT2 FUNCTIONS
Initialize a ggplot ggplot()
Scatter plot geom_point()
Box plot geom_boxplot()
Violin plot geom_violin()
Scatter plot with dodge geom_jitter()
Dot plot geom_dotplot()
Bar chart geom_bar() or geom_col()
Line plot geom_line()
Histogram geom_histogram()
Density plot geom_density()
Error bars geom_errorbar()
Title and axis labels labs()

Adding more aesthetics

Now that we have the basics, we can play around with aesthetics and other geometric objects. Let’s start by adding to the aesthetics. Say we were interested in observing the relationship between bill length (x-axis) and bill depth (y-axis) by island.

ggplot2::ggplot(penguins) +
  aes(x = bill_length_mm, y = bill_depth_mm, color = island) +
  geom_point()

By adding the designation for color to the aes() argument, R not only colors each point by its associated island, but also puts this information in the plot legend.

Note: Typically we place information in aes() when we want to use information that is in our data frame. What if we placed color as an argument in geom_point()? Give it a try.

Layering geometric objects

Not only can we change the aesthetics but we can also change the geometric objects we are plotting. Let’s explore our data using boxplots instead…

Say we are interested in exploring how body mass differed across sex.

ggplot2::ggplot(penguins) +
  aes(x = sex, y = body_mass_g) +
  geom_boxplot()

Although boxplots are great, they do hide the underlying data. To avoid this, we can also plot each observation as a point on top of the boxplots. This is as simple as adding geom_point() at the end with a +:

ggplot2::ggplot(penguins) +
  aes(x = sex, y = body_mass_g) +
  geom_boxplot() +
  geom_point()

I like to use a geometric object called geom_jitter(). geom_jitter() is similar to geom_point() but won’t result in points laying directly on top of each other

ggplot(penguins) +
  aes(x = sex, y = body_mass_g) +
  geom_boxplot() +
  geom_jitter()

Using geom_jitter(), I also like to specify how much wiggle (or jitter) the points have. I like to keep their jitter pretty narrow. This can be changed by adding a width designation:

ggplot(penguins) +
  aes(x = sex, y = body_mass_g) +
  geom_boxplot() +
  geom_jitter(width = 0.2)

Something to keep in mind – the order that you add components to a plot will be the order they are added. As a personal preference I like having points in front of boxplots – this is why I add the boxplot first and the geom_jitter() second.

Adding plot labels

The final thing I want to share is adding axes labels and titles to a plot. For this we use the function labs.

ggplot2::ggplot(penguins) +
  aes(x = sex, y = body_mass_g) +
  geom_boxplot() +
  geom_jitter(width = 0.2) +
  labs(x = "Sex", y = "Body mass (g)", title = "Body mass of penguins given sex")

And that is really about it. You can play around with what you want to plot in x and y, as well as aesthetics like size and color.\

alpha just makes objects more transparent. The smaller the alpha the more transparent the object

ggplot2::ggplot(penguins) +
  aes(x = sex, y = body_mass_g) +
  geom_boxplot(size = 0.5) +
  geom_jitter(width = 0.2, alpha = 0.5, aes(color = island)) +
  labs(x = "Sex", y = "Body mass (g)", title = "Body mass of penguins given sex")

Putting it all together

With these basics and the powerful functions from {dplyr} for summarizing data, your plotting possibilities are truly endless.

library(dplyr)
penguins %>%
  dplyr::group_by(species, island) %>%
  dplyr::summarize(avg_body_mass = mean(body_mass_g/1000, na.rm = TRUE)) %>%
  ggplot2::ggplot() +
  aes(x = island, y = avg_body_mass, fill = island) +
  geom_col() +
  theme_classic() +
  labs(x = "Island", 
       y = "Average body mass (kg)", 
       title = "Average penguin body pass across islands")