Graphing with {ggplot2}

IOC-R Week 5

Recap Week 4

Brief Recap Week 4

  • Operators
    • Logical: &, |, !, return TRUE or FALSE.
    • Comparison: ==, !=, <, <=, >, >=.
    • The %in%: check membership

What does the following code do?

(2.1 > 1) & (0.049 > 0.05)
(2.1 > 1) | (0.049 > 0.05)

# df is a data frame, 
# one of the columns is called "pvalue"
df[df$pvalue < 0.05, ]

"a" %in% c("c", "ba")
  • Conditions
    • statements: if, if…else
    • ifelse() function

What does the following code do?

if (lfc > 2 & pvalue < 0.05 ) {
  print("gene of interest")
} else {
  print("gene to remove")
}

Brief Recap Week 4

  • Functions
## syntax
my_function <- function(arg1, arg2) {
  # code
  return(result)
}

Based on the following code:

  • what will you get when calling times_three(2)?
  • What will you get if you type time_factor in the console?
times_three <- function(number) {
  time_factor <- 3
  res <- number * time_factor
  return(res)
}
times_three(2)
time_factor

R Packages

What are Packages in R?

Packages are collections of functions, data, and documentation.

  • Pre-installed packages: {base}, {utils}, {graphics}, etc.
dim() # from {base}
head() # from {utils}
plot() # from {graphics}

To check the list of installed packages in RStudio:

Installing Packages

By default, R will install the lastest version of a package.

  • Click-button way:
  • Via command, i.e., install.packages("ggplot2")
  • CRAN (Comprehensive R Archive Network) is a network of servers around the world that store identical, up-to-date, versions of code and documentation for R.
  • Bioconductor is a specialized repository like CRAN, but focused on bioinformatics. It provides R packages for analyzing genomic and biological data.

Management of Packages

  • Update: click on the and a popup will show you packages that can be updated.

  • Delete: click on the delete buttom after the package

Or use remove.packages("tibble").

We’ll talk about packages’ version management via the renv package in session 6 if time allowed.

Using Functions of Package

To use (call) a function from a package, we can either:

  • load the entire package (attach it to the environment)
library(ggplot2) # load the package
ggplot() # call the function to iniate a ggplot

A loaded package will be checked in the “Packages” panel.

You only need to load a package once per R session.

However, if you’re running your script in a non-interactive way, make sure to include the library() calls in your script, ideally at the beginning.

  • or load one function of the package at the time with the syntax pkg_name::fct_name
ggplot2::ggplot()

This way is recommanded if you need to use only one function of a package.

Graphing with {ggplot2}

Before Plotting …

What message you want to show via your figure?

flowchart LR
  A{Which variables?} --> B{Data properties}
  B --> C{Figure type}


Check out these websites: from Data to Viz and The R Graph Gallery (by Yan Holtz)

Compositions of a ggplot

(Figure adpated from QCBS R Workshop Series.)

How to Build a ggplot

All ggplot2 plots begin with a call to ggplot(), supplying default data and aesthethic mappings, specified by aes(). You Then add layers, scales, coords and facets with +. —— ggplot2 Reference


Example using the built-in dataset iris:

str(iris) # data overview
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Basic Plot

library("ggplot2") # load package

Start by calling ggplot():

p0 <- ggplot(
  data = iris, # a data frame
  mapping = aes(x = Sepal.Length, y = Petal.Length)
)
p0

Specify data, x and y axes.

The data should be a data frame containing both variables needed for the plot.

Add a geometric layer:

# geom_point() is used for scatter plots
base_plot <- p0 + geom_point()


base_plot

Use points for visualisation.

Aesthetics - Color

The most common aesthetics: color, fill, shape, size, alpha (transparency), etc.

  • Static aesthetics: a fixed value and apply to the whole layer
base_plot_red <- p0 + geom_point(color = "red")
base_plot_red

  • Aesthetic mappings: visual preoperties that depend on data values (to be used in aes())
base_plot <- p0 + geom_point(aes(color = Species))
base_plot

Aesthetics - Shape

The most common aesthetics: color, fill, shape, size, alpha (transparency), etc.

  • Static aesthetics: a fixed value and apply to the whole layer
p0 + geom_point(shape = 3)

  • Aesthetic mappings: visual preoperties that depend on data values (to be used in aes())
p0 + geom_point(aes(shape = Species))

Aesthetics - Size

The most common aesthetics: color, fill, shape, size, alpha (transparency), etc.

  • Static aesthetics: a fixed value and apply to the whole layer
p0 + geom_point(size = 3)

  • Aesthetic mappings: visual preoperties that depend on data values (to be used in aes())
p0 + geom_point(aes(size = Petal.Length))

Aesthetics - Alpha

The most common aesthetics: color, fill, shape, size, alpha (transparency), etc.

With no transparency:

p0 + geom_point(size = 3)

Use the alpha (between 0 and 1) parameter:

p0 + geom_point(size = 3, alpha = 0.5)

Labels (Axes - Titles - Legend)

base_plot

Use labs() to modify labels of plot, axes, legend.

p_labs <- base_plot + labs(
  x = "Sepal Length (cm)",
  y = "Petal Length (cm)",
  title = "Scatter plot with customized labels.",
  color = NULL # remove legend title
)
p_labs

Add Other Layers

# the scatter plot with modified lables
p_labs

Add a linear regression line using geom_smooth():

p_regline <- p_labs + geom_smooth(method = "lm", se = FALSE)
p_regline

Each geom_*() function adds a new layer to the plot, just like stacking transparent sheets on top of each other to build the final image.

Aesthetic Mappings

ggplot(
  data = iris,
  mapping = aes(
    x = Sepal.Length,
    y = Petal.Length
  )
) +
  geom_point(aes(color = Species)) +
  geom_smooth(method = "lm", se = FALSE)

Compare the following code with the code on the left. What is different between the codes? What is different about the resulting graphs?

ggplot(
  data = iris,
  mapping = aes(
    x = Sepal.Length,
    y = Petal.Length,
    color = Species
  )
) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

Common Used Geometries (1)

  • Boxplot:
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
  geom_boxplot()

  • Violin plot:
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
  geom_violin()

Common Used Geometries (2)

  • Histogram:
ggplot(iris, aes(x = Sepal.Length)) +
  geom_histogram()

  • Bar plot:
ggplot(iris, aes(x = Species)) +
  geom_bar()

geom_histogram() and geom_bar() only require one variable for the x-axis. The y-axis is automatically calculated.

Themes

  • Built-in themes (theme_*()): theme_grey() (default),theme_bw(), theme_light(), theme_classic(), etc.
p_labs + theme_classic()

  • Use theme() function to tweak elements, e.g.:
p_labs + theme(
  legend.position = "none", # hide legend
  plot.title = element_text(
    hjust = 0.5, # center plot title
    size = 5, # plot title size
  ), 
  axis.text.x = element_text(angle = 90) # rotate axis' text
)

Saving Your Graphs

Use ggsave() to save plots in high resolution for publications.

ggsave(
  filename = "path/to/figure.png", # figure file name
  plot = last_plot(), 
  # save by default the last figure,
  # you can provide the figure name to specify the plot to be saved.
  device = "png",
  # can be one of "eps", "ps", "tex" (pictex), "pdf",
  # "jpeg", "tiff", "png", "bmp", "svg" or "wmf"
  width = 6.3,
  height = 4.7,
  units = "in", # can be one of "in", "cm", "mm" or "px"
  dpi = 300 # plot resolution
)

Save the basic plot to the outputs folder in your project. Check the saved figure via the Files panel in RStudio.

ggsave(
  filename = "../outputs/basic_scatter_plot.png",
  plot = base_plot,
  width = 5, height = 5, units = "cm", dpi = 150
)

Let’s Practice !

Today’s Goals

  • Install new R packages
  • Create basic plots with ggplot2