Microcredencial “agRo-al” - Session 4 | Back to HOME


Creating R objects

By recapitulating concepts seen in previous sessions, we will create from scratch different R objects to explore them in detail. According to the next instruction:

==>Create a vector of numbers, called “mydata”, and ranging between 1 and 1000 with a sequential increasing of eight<==

create the appropriate functions and indexing code and pseudocode to answer the following questions:

  • How many elements contain your vector?
  • What is the 83th element of your vector?
  • What’s the results of multiplying four times the 52th element?
  • What’s the sum of all the elements of your vector?
  • Are any round hundred value in your selected dataset?

The solution is:

mydata <- seq(1, 1000, 8) ## Create a non-continuous random sequence of numbers

mydata

length(mydata) ## Checking the vector lenght

mydata[83] ## Selecting the 83th element of the vector

mydata[52] * 4 ## Multiplying all vector elements 2 times

sum(mydata) ## Adding of all vector elements

## Creating amy datasets
mydata <- seq(1, 1000, 8)
mydata2 <- seq(100, 1000, 100)

## Comparing datasets
mydata2 %in% mydata

Now, let’s try a more complex code to create a data frame:

## Take values from a pre-created vector
mydf<- as.data.frame(mydata)

## Evaluate the dimension attribute of our data frame
dim(mydf)

## Visual inspection of the data frame
mydf$mydata

## Random selection on vector elements for grouping variable 1
groupA <- sample.int(125, 63, replace = F)

## Set the remaining vector elements for grouping variable 2
groupB <- setdiff(seq(1:125), groupA)

## Evaluate total number of elements in grouping variables
length(groupA) + length(groupB)

## Pre-create an empty vector with a discriminat variable
mydf$feature <- "NA"

## Assign categorical variables to specific positions of the data frame
mydf$feature[groupA] <- "groupA"

## Assign categorical variables to specific positions of the data frame
mydf$feature[groupB] <- "groupB"

Then, evaluate the attribute mode for each vector of the data frame

is.numeric(mydf$mydata)
is.factor(mydf$feature)
is.character(mydf$feature)
## OR
mode(mydf$feature)
mode(mydf$mydata)

Finally, we need to create a discriminant variable to assess mydata data and complete the data frame formatting:

## Convert characters into a discriminant factor
mydf$feature <- as.factor(mydf$feature)

## Assess the factor assignment
levels(mydf$feature)

## The data frame is ready for statistical and graphical assessments
tapply(mydf$mydata, mydf$feature, median)
boxplot(mydf$mydata ~ mydf$feature)

Multiclass R objects and interactions

Following the below code, we will build a data frame with three different variables per observation:

my_variables<-rep(c("Yes", "No"), 500) ## Setting a vector with a two-level factor
set.seed(2025) ## Seed for random generator
my_numbers<-sample(450:1800, 1000, replace=TRUE) ## Setting a range of values to generate randomly

Then, we can evaluate their attributes. Once we distinguish the nature of those information vectors, we proceed to merge them into the desired data frame.

mydf <- as.data.frame(my_numbers) ## Adding my previous vectors into a data frame
mydf[,2] <- my_variables
head(mydf) ## Checking top entries of my data frame
dim(mydf) ## Evaluating the dimension attribute
colnames(mydf) <- c("Score", "Growth") ## Setting proper column (variable) names
head(mydf)

Now, let’s create new discriminant factors:

mydf$Sex<-rep(c("Female", "Male"), 500) ## New variable with pre-defined "Sex" column name
mydf$Enzyme.act <-rep(c("High", "Mild", "Low", "None"), 250) ## New variable with pre-defined "Enzyme.act" column name
head(mydf)

Evaluate attributes of different variables assembled into our data frame, including mode, length, dimension, and levels of the factors.

Once you have noticed all the above aspects relevant to your data frame structure, we can proceed to explore this dataset in detail. The most intuitive analysis is looking at distributions across variables independently, taking into account the grouping factors or not.

## Assessing the average of the numeric variable
mean(mydf$Score) 

## Retrieving the standard deviation of the numeric variable
sd(mydf$Score)

## Computing the inter-quartile distribution
quantile(mydf$Score)

## Calculating 95% Confidence Interval (CI)
t.test(mydf$Score)$conf.int ## Exploring particular elements from the created object

Once the numeric variable distribution is evaluated globally, we can assess its distribution across different discriminate variables (factor). For instance, we will try computing basic statistics for the Score numeric variable when discriminated by Enzyme.act factor variable:

## Function to evaluate a numeric variable distribution across factors
tapply(mydf$Score, mydf$Enzyme.act, mean)
tapply(mydf$Score, mydf$Enzyme.act, sd)
tapply(mydf$Score, mydf$Enzyme.act, quantile)

Please, compute the same statistics for Score numeric variable across Sex and Growth factors.

In case we need a visual inspection of certain results of interest, we can launch a rapid graphical visualization of the distribution grouped by a particular discriminant variable (factor):

## Make a basic boxplot
boxplot(mydf$Score ~ mydf$Enzyme.act)

## Plotting adds-on
stripchart(Score ~ Enzyme.act, data = mydf, vertical = TRUE, method = "jitter", jitter = 0.2, pch = 21, cex = 0.5, add = TRUE, col = c("black", "black", "black", "black"))

The resulting plot must be quite similar to the below shown: