Microcredencial “agRo-al” - Session 8 | Back to HOME


Visualization and graphics functions in R

R has a distinctive variety of graphics possibilities. Moreover, the already superior graphical functions are boosted by the outstanding performance of certain packages such as ggplot2, and the multiple reverse dependencies (or extensions, needing ggplot2 as dependency) use this last to expand graphical prospects (e.g. gggridges, ggcorrplot, ggradar, gganimate, etc.) 1.

The base potential of R in graphical data representation can be intuited by typing the following command line in the R console:

demo(graphics)

The result of a graphical function does not have the attributes of a conventional R object but can be sent to a graphical device (such as a graphical window) or saved in a file instead.

After a successful scripting of a graphical function, R opens a graphical window (graphical device) and displays the graph or plot. When using RStudio, the graphical device is integrated into the GUI, particularly in the plot tab of the console panel. The graphical devices will depend on the operating system, but for Linux and Windows-based OS, the graphical device is called X11. If using the R console, the following function will pop up the graphical device:

X11()

To understand how R deals with graphics, we should know there are two types of graphical functions:

  • the high-level plotting functions, which create a new graph.

  • the low-level plotting functions, which add elements to existing graphs.


High-level graphical functions

The following functions comprise a small repertoire of the most commonly used ones:

  • plot(x): plot the x values (on the y-axis) and ordered on the x-axis.

  • plot(x, y): plot of x values on the x-axis, and y values on the y-axis.

  • pie(x): a circular pie-chart.

  • boxplot(x): a box-and-whiskers plot.

  • stripchart(x): plot the x values on a line (alternative to boxplot() for small sample sizes).

  • hist(x): histogram of the frequencies of x values.

  • barplot(x): histogram of the x values.

  • dotchart(x): if x is a data frame, plots a Cleveland dot plot (stacked plots line-by-line and column-by-column)

  • qqnorm(x): quantiles of x values with respect to the normal distribution.

  • qqplot(x, y): quantiles of y values with respect to the quantiles of x ones.

## Examples of plots

## plot(x)
plot(log10(seq(1:500) * 1/seq(1:100)))

## plot(x, y)
plot(seq(1,5000,10), sqrt(seq(1:500)))

## pie(x)
pie(rep(1,12))

## boxplot(x)
boxplot(iris$Sepal.Length ~ iris$Species)

## stripchart(x)
stripchart(iris$Sepal.Length ~ iris$Species)

## hist(x)
hist(iris$Sepal.Width)

## barplot(x)
barplot(iris$Sepal.Width)

## sorting and re-ordering to plotting the variable again
iris.ordered <- iris[order(iris$Sepal.Width, decreasing = T),]
barplot(iris.ordered$Sepal.Width)

## dotchart(x)
dotchart(iris$Petal.Width)

## qqnorm(x)
x <- rt(100, df = 5)
qqnorm(x)

## qqplot(x, y)
x <- sample(200, replace = T)
y <- rt(200, df = 3)
qqplot(x, y)

Some arguments are common to all of the functions mentioned above. Here are the most relevant ones to ornament the output plots:

  • xlim=, ylim=: setting the lower and upper limits of the “x” and “y” axes. They must be numeric variables.

  • xlab=, ylab=: set axes titles, must be variables of character mode.

  • main=: set the plot title, must be a character variable.


Low-level graphical functions

The following functions comprise examples of the most commonly used low-level graphical functions used in R base:

  • points(x, y): adds points regarding x and y coordinates.

  • lines(x, y): identical to points function, but with lines.

  • text(x, y, labels): adds the labels text at x and y coordinates.

  • abline(a, b): draws a line of slope b and intercept a (for lm).

  • abline(h = y): draws a horizontal line at the y ordinate.

  • abline(v = x): draws a horizontal line at the x abscissa.

  • title(): adds a title text.

Next, we review a few examples of how these low-level functions work in R graph design:

## Adding points to a boxplot
boxplot(iris$Petal.Length ~ iris$Species)
points(iris$Petal.Length ~ iris$Species)

## Adding title and abline to a scatter plot
plot(mtcars$hp, mtcars$disp)
reg1 <- lm(mtcars$disp ~ mtcars$hp)
abline(reg1)
title("Horsepower and Displacement linear regression")

Graph arguments

In addition to the high-level and low-level functions controlling the data plotting, there are multiple parameters or arguments to deal with. The most useful ones are: bg (color background), cex (size of text and symbols with variants cex.axis, cex.lab, cex.main), col (color control of symbols), las (orientation of axis labels, integer), lty (line type, integer), lwd (line width), mar (margins of the plot, four values), and pch (symbol type). Of special relevance is a great variety of symbols and line types linked to integers to declare in respective arguments:

R “pch” argument

drawing

R “lty” argument

drawing

To better understand the role of each of the above-listed arguments, we will review them in the following plotting script:

## Read the external dataset
mydata <- read.csv("BEDCA_dataset.csv", header = T, row.names = "food_id", sep = "\t")

## Attaching my dataset to the R search path
attach(mydata)

## Selecting variables to compare
colnames(mydata)

## Computing statistical parameters of interest
reg1 <- lm(Energy_.kcal. ~ Total_lipid_.fat._.g.)
cor1 <- cor.test(Total_lipid_.fat._.g., Energy_.kcal.)

## Plotting variables in a scatter plot with appearance arguments - plot(x, y)
plot(Total_lipid_.fat._.g., Energy_.kcal.,
     pch = 21,
     cex = 1,
     col = "black",
     xlab = "Total fat (g per 100g)",
     ylab = "Total energy (kcal per 100g)",
     cex.axis = 0.9,
     cex.lab = 1.2,
     title("Fat vs Energy content in BEDCA food"),
     bg = "lightblue")

## Executing low-level functions to add information
abline(reg1, lty = 1, lwd = 3.0, col = "darkgrey")
text(x = 70, y = 100, paste("Intercept =", round(reg1$coefficients[1], 3)), cex = 0.7)
text(x = 70, y = 200, paste("Slope =", round(reg1$coefficients[2], 3)), cex = 0.7)
text(x = 70, y = 300, paste("R =", round(cor1$estimate, 3)), cex = 0.7)
text(x = 70, y = 400, paste("p =", format(cor1$p.value, scientific = T)), cex = 0.7)



## Creating alternative high-level function for categorical variables (boxplot)

## Subseting a dataset to containig multiple factor levels
mydatax3var <-  subset(mydata, food_group == "Fats_and_oils" | food_group == "Fruits_and_fruit_products" | food_group == "Milk_and_milk_products")
boxplot(mydatax3var$Energy_.kcal. ~ mydatax3var$food_group,
        col = c("blue","red", "green"),
        horizontal = T,
        xlab = "Energy content (kcal/100g)",
        ylab = "Food type",
        ylim = c(min(mydatax3var$Energy_.kcal.) - 1, max(mydatax3var$Energy_.kcal.) + 1),
        notch = F,
        )
stripchart(mydatax3var$Energy_.kcal. ~ mydatax3var$food_group,
           col = c("black","black", "black"),
           method = "jitter",
           pch = 21,
           cex = 1,
           vertical = F,
           add = T)

To save the designed plot in an external file, you should call one of the functions directly linked to the type of file intended and declare the main arguments of quality and resolution: tiff(), png(), bmp(), jpeg().

## Set the format and parameters of the output file
tiff(filename = <plotName>, width = <integer for units>, height = <integer for units>, units = <"px, in, cm, or mm">, res = <integer for ppi>)

## Then, state the code for you plot
plot(...)

## Close the graphical device for saving it in the working directory
dev.off()

Advanced graphical functions with “ggplot”

The R ggplot2 library is a data visualization system. Compared to the R base graphics functions earlier reviewed, ggplot2 exhibits a more complex control of R objects, but superior performance to produce elegant artwork for graphical representation of scientific data. ggplot2 follows similar premises to R base plotting with high-level and low-level functions and arguments to tune them. The gglplot2 library is part of the tidyverse library, this last being a collection of R packages designed for data science manipulation and representation 2. So, you can install it as part of the tidyverse suite or as a standalone (see Session #6).

In a general fashion, the elements to consider to build a graph with ggplot2 are:

  • A data frame or tibble (.data) containing the data to visualize.

  • Aesthetics (aes), meaning the list of relationships between data variables (numerical and categorical) as well as fashion graph aspects (e.g. shapes, colors, labels, etc).

  • Geoms (geom_), indicating and defining the geometric elements in the plot (points, lines, circles, etc.).

Learning all ggplot functions, performance, and capabilities would occupy an entire training course. Meanwhile, we will inspect the major elements for its base plotting functions and arguments; those interested in advanced routines could visit the author’s web resources to understand in-depth how ggplot works (https://ggplot2.tidyverse.org/).

For comparative aims, we will build similar plots (using the same datasets and objects created) to those made in the previous section. Thus, they will serve to contrast arguments and the scripting manner.

## Loading the ggplot library
library(ggplot2)

## Configuring the functions and arguments for the scatter plot
ggplot(mydata, aes(Total_lipid_.fat._.g., Energy_.kcal.)) +
  geom_point() +
  #geom_point(color = "black", size = 4, shape = 21, fill = "red") +
  xlab("Total fat (g) per 100g") +
  ylab("Total energy (kcal) per 100g") +
  geom_smooth(method = lm, linetype = "solid", fill ="pink", color ="black") +
  annotate(geom = "text", x = 75, y = 250, label = paste("Pearson's R =", round(cor1$estimate, 3), sep = " "), color = "black", size=5) + 
  annotate(geom = "text", x = 75, y = 180, label = paste("p =", format(cor1$p.value, scientific = T), sep =" "), color = "black", size = 5) +
  theme_classic() + 
  theme(axis.text = element_text(size = 10), axis.title = element_text(size = 15, face = "bold"))


## Configuring the functions and arguments for the boxplot
ggplot(mydatax3var, aes(Energy_.kcal., food_group, fill = food_group)) +
 geom_boxplot(notch = F, outlier.colour = "black", outlier.shape = 19, outlier.size = 2) +
 #scale_fill_manual(values = c("#E41A1C", "#377EB8", "#C77CFF")) +
 #scale_fill_brewer(palette = "Dark2") +
 geom_jitter(colour = "black", size = 2.5, shape = 21, width = 0.1, height = 0.1) +
 guides(fill = "none") +
 ggtitle("BEDCA dataset") +
 #theme_classic(base_size = 20) + # to set all "theme" arguments at once
 ylab("Food type") +
 xlab("Energy content (kcal/100 g)") +
 theme(axis.text = element_text(size = 13), axis.title = element_text(size = 16, face = "bold"), title = element_text(size = 20, face = "bold")) +
 theme(plot.margin = unit(c(0.25,0.25,0.25,0.25), "cm"))

To finish, save all the plots generated previously in the working directory (PNG versions of R base and ggplot functions) and compare their resolutions and appearance as a whole.


  1. https://mode.com/blog/r-ggplot-extension-packages.↩︎

  2. https://www.tidyverse.org/.↩︎