We already know the information in R must be stored in objects with unique and unambiguous names. Next, we have to recognize the different object types you can create:
Character strings: a set of characters between quotes, including break lines or not. Thus, “spores” and “elevated spore count” are each objects with a single character value.
a <- "elevated spore count"
a
mode(a)
Vectors: an ordered collection of numbers or character strings indexed by the integers (e.g. 1, 2, …, n), being the last integer (n) the vector length. Upon creation of a vector containing numeric and character values, R will automatically convert the numeric values to characters.
y <- c(12, 58, 96, 35)
y
mode(y)
is.vector(y)
length(y)
Factors: a type of vector containing underlying numeric values (e.g. 1, 2, …, n,), each having an associated character label. These labels are the levels of the factor. The most common use of a factor is to store a categorical variable for use in data analysis as a discriminant variable.
z <- c("variety.A", "variety.C", "variety.C", "variety.B", "variety.B", "variety.A")
z
class(z)
levels(z)
Matrix: a two-dimensional collection of, generally, numeric values indexed by pairs of integers or coordinates (i, j). A matrix has an additional attribute (dim), having a length equal 2.
b <- matrix(c(20,25,32,14,65,26), nrow = 3, ncol = 2)
b
class(b)
dim(b)
List: contains an ordered collection of objects, which can be of different types.
mylist <- list(z, c(10,20,30), "56", y)
mylist
class(mylist)
mylist[1] ##example of indexation, see next section
Data frame: is a widely used list type to store datasets, with a row for each observation and a column for each variable. Each variable can be of a different type. A data frame is created implicitly by the function read.table. When created from scratch, you can declare the name of the data.frame elements. The vectors joined in a data frame must be of the same length.
mydf <- data.frame(Outcome = c(1, 0, 1, 1, 0, 1, 1, 0),
Treated = c("yes", "yes", "no", "no", "no", "no", "yes", "no"),
Length = c(24, 55, 39, 18, 34, 56, 25, 30))
mydf
class(mydf)
Array: similar to matrix, an array can be a multidimensional object. Upon using the array function, we can add the dim parameter to specify the dimensions to split data.
myarray <- array(c(1:48), dim = c(4, 6, 2))
myarray
class(myarray)
The indexation or indexing system is the most efficient manner to access or select particular elements of an R object. It can adopt the numeric or logical mode. Suppose you need to know the 8th value of a vector; then to access the eighth value of such a vector y, we must type y[3] which can be used either to extract or to modify this value if needed.
IMPORTANT NOTE: From now on, we will prepare the R code in more depth. Good scripting practices include control of indentation, formatting, and code documentation/organization.
## Creating a random vector
y <- seq(1, 100, 12)
## Accessing to the 8th element of the vector
y[8]
The indexation of elements in a bi-dimensional object (e.g. matrix or data.frame) can be a little more complex. It will just require to declare its coordinate or position concerning rows and columns numbers:
## Creating a random matrix
y <- matrix(seq(1, 100, 12), ncol = 3, nrow = 3)
## Accessing to the element placed at the intersection between the second row and third column
y[2, 3]
## Selecting all the elements of the third column
y[, 3]
If your R object is composed of other objects, the indexation also gain in complexity and ways to do so:
## A default dataset implemented in R for training
mtcars
## Recognizing the dimension of the object
dim(mtcars)
## Defining the class of the object
mode(mtcars)
## Accessing the fourth list of the list
mtcars[4]
## Accessing the "hp"" list of the list
mtcars["hp"]
## Accessing the second element of the fourth list of the list
mtcars[2, 4]
## Accessing the elements of the fourth list. Recursive indexing
mtcars[[4]]
Indexing can also be used to remove one or several rows or columns using negative values. Thus, you can filter out the useless information from the R object and simply your dataset for downstream analyses:
## Checking original "mtcars" object - top view
head(mtcars)
## Removing a particular column of the "mtcars" object
mynewmtcars <- mtcars[,-7]
## Checking the resulting new object
head(mynewmtcars)
## Removing a range of columns (several vectors at once)
mynewmtcars <- mtcars[,-c(7:10)]
## Compare dimensions of the new and preceeding objects
dim(mtcars)
dim(mynewmtcars)
Logical indexing can also take place if we use comparison operators to evaluate every element of the object to inspect.
## Retrieve observations matching a desired condition - above the 75th percentil of the distribution
mtcars$hp[mtcars$hp > quantile(mtcars$hp, 0.75)]
## Finding the specific observations with those values
match(mtcars$hp[mtcars$hp > quantile(mtcars$hp, 0.75)], mtcars$hp)
Moreover, you can modify particular elements of the vector or object by applying comparison operators.
## Modifying a factor or numeric variable
mtcars$am[mtcars$am == 1] <- "automatic"
mtcars$am[mtcars$am == 0] <- "manual"
Despite the original attribute (mode or class) of the R objects created, you can convert and make transitions between different object types using the useful as.xxxxx functions. Let’s try some of them:
x <- c("high", "low", "medium")
x
class(x)
as.factor(x)
x <- c(1,1,1,1,0,0,1,0,1,0)
x
class(x)
as.factor(x)
x <- c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE)
x
class(x)
as.numeric(x)
By using the “mtcars” dataset, elaborate the code to answer the following concerns: