As mentioned earlier in the course introduction, one of the most valuable features of R is its endless source of packages (libraries) for analysing data from mathematical, statistical, and graphical perspectives.
The standard R installation usually includes a small set of packages distributed alongside any R version. Indeed, some of them are automatically loaded in memory when R is launched. To inspect such libraries, we can use the search() function:
search() ## Explore attached R packages and objects
However, the companion libraries installed with R are scarce compared with the universe of tools available in the official repository of R, namely CRAN (The Comprehensive R Archive Network ) 1. Another trusted repository of R packages is Bioconductor, which is open-source software for bioinformatics 2. This last repo is highly specialised for developers and users aiming novel algorithms and tools to explore biological data. There are alternative (non-official) multiple sources to install R packages, generally from author repositories, e.g. personal or institutional Github spaces or independent open-source projects 3 4.
Consequently, many R packages dedicated to statistical methods and graphics are distributed separately, and must be installed and loaded in R before starting to work with them. Notwithstanding, several of these packages represents cornerstones for developers and frequently used as building blocks of third-party packages; then they are usually installed alongside other R packages in the form of dependencies (import declaration at NAMESPACE package metadata). Examples of such packages are:
boot: resampling and bootstraping methods
class: classification methods
cluster: clustering methods
dplyr: grammar of data manipulation
lattice: Lattice (Trellis) graphics
lme4: linear and generalized linear mixed-effects models
MASS: contains functions, tools and data sets from the li- braries of “Modern Applied Statistics with S”, the basis of R project.
nlme: linear and non-linear mixed-effects models
survival: survival analyses
The steps and code to install an R package will depend on the operating system and whether the intended R package will be installed from the code sources or pre-compiled binaries. In the latter situation, it is recommended to use the pre-compiled packages available on the CRAN repository.
In the next sections, we will install a few packages of interest from the two main repositories for R packages useful for the next sessions. We will also review and explore the most conventional way to install packages from non-canonical R repositories.
The first step before launching the installation code from the R console is to confirm that the package exists in the CRAN repository. So, go to the repository website and search for it. Besides, take a look at the version available (if it meets the desired one) and the dependencies needed to satisfy the installation (“Imports” field). This last information will be critical to understanding any error message resulting from a failed installation.
Packages aimed to install:
ggplot2
readxl
If the packages are available at CRAN and there is no apparent issue with their dependencies, we can now install the code source as follows:
install.packages("package_name")
The installation will last depending on the network’s bandwidth, traffic, and the performance of your PC processor. It will also depend on how many dependencies (additional R packages) must be installed to satisfy the central package installation. Upon the first time installing a package, the system often requests the CRAN mirror (secondary repositories) from which to complete the download. Please, select the nearest one geographically.
Troubleshooting. The most common issue upon installation is the lack of available dependencies (or updated ones), no matter if you’re installing from CRAN or Bioconductor. Alternatively, the lack of system libraries to install dependencies or main packages is also frequent (Mac, Linux, and Windows OS libraries). So, check the nature of the conflict and look at specific and standalone installations from third-party resources.
The successful installation of the desired R package does not grant you direct access to its functions. For such an aim, you will need to load into the R environment to make available all its functions and datasets implemented in it.
## Loading the installed libraries
library(ggplot)
library(readxl)
Now, we can test if the libraries work properly. For such an aim, we can download the BEDCA dataset in Excel format and try to open it with the functions implemented in readxl library:
## Declare the URL of the file to download
url <- "https://github.com/agRo-al/agro-al.github.io/raw/refs/heads/main/BEDCA_dataset.xlsx"
## Set the name and location where you want to save it (agroal working directory)
file_path <- "/home/user/agroal/BEDCA_dataset.xlsx"
## Execute the download.file() function with arguments
download.file(url, file_path)
## Load the Excel file with "readxl" function
myexcel <- read_excel("BEDCA_dataset.xlsx")
## Visualise the first rows of the data frame, shown as "tibble"
head(myexcel)
To stay tuned for packages installed in your system or to keep them updated, you should regularly use the following functions for library management:
## Retrieve details of R packages installed in your system
installed.packages()
## Checks the installed packages versions against those available
on CRAN - suggest you update them one by one.
update.packages()
## Inquires the version installed for a specific package
packageVersion("package_name")
## Downloads packages from CRAN-like repositories - equivalent to install.packages() function
download.packages("package", "destdir", "repos")
For various reasons, many R language-based packages are stored outside the CRAN. We should monitor the repositories where such R resources are stored and distinguish the server location (banned sites or suspicious) to avoid downloading malware. It is also a good practice to check for scientific publications supporting the R package development and associated URLs. One trustworthy repo for downloading non-CRAN packages is Github, for hosting personal or institutional developers’ activities.
The fact that such material is not available in CRAN makes it susceptible to not complying with good scripting practices and standards set by CRAN. Moreover, the maintenance of those R packages is usually poor, leaving them deprecated with respect to their possible CRAN dependencies, which are constantly updated.
The conventional way to install non-CRAN packages, disregarding the OS, is first to download the source tarball version of the desired package - extension .tar or .tar.gz (when zipped). These latter correspond with the R output upon building all the scripts and functions implemented in the package.
NOTE: a tarball is a compressed archive file, associated with Linux/Unix systems, combining multiple files and directories into a single file for straightforward distribution.
The downloaded package in tarball version should have the following name structure:
*<package_name>_<version>.tar.gz*
install.packages("path/to/the/file/packageName_version.tar.gz", repos=NULL, type="source")
or simply:
install.packages("path/to/the/file/packageName_version.tar.gz")
Example: go to the developer web of the “CoDaSeq” package and explore the installable file. What’s the current available version of this package?
As mentioned above, the Bioconductor (https://www.bioconductor.org/) repository is a trusted source for accessing and downloading R based packages, specifically for biology investigation (Bioinformatics). It operates as a CRAN-like repository, with CRAN-derived and its own standards. The packages hosted in Bioconductor cover statistical and data analysis tools for genomics, metabolomics, proteomics, metagenomics, imaging, protein structures, genomicds architecture, and transcriptomics, among other fields. Up to date (April 2025), it hosts almost 2300 packages.
The Bioconductor working mode requires BiocManager, an application guiding the installation of Bioconductor packages. On the other hand, BiocVersion is also an important package management tool which controls the appropriate version of Bioconductor packages and control their compatibility with the latest R releases.
Bioconductor has a repository and release schedule that differs from R. A consequence of the mismatch between R and Bioconductor release schedules is that the Bioconductor version identified by install.packages() is sometimes not the most recent release available. So, the BiocManager::install() is the recommended way to install Bioconductor packages 5.
To install the BiocManager, we must type the following command on the R console:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
The current release of Bioconductor is version 3.20 (3.21 for developers); it works with R version 4.4.0. Users of older R and Bioconductor must update their installation to take advantage of new features and to access packages that have been added to Bioconductor since the last release 6.
Then, we can proceed to install core packages of the current Bioconductor distribution:
BiocManager::install(version = "3.20")
As an example of how install specific Bioconductor packages, we will try to install any of the most useful packages to assess differential gene expression:
## Installation of "limma" expression analysis suite based on linear models
BiocManager::install("limma")
## Installation of "edgeR" expression analysis suite based on negative binomial and Bayes estimations
BiocManager::install("edgeR")
## Installation of "DESeq2" expression analysis suite based on negative binomial distribution
BiocManager::install("DESeq2")