Meritshot Tutorials
- Home
- »
- R-Data in Structures
SQL Tutorial
-
R-OverviewR-Overview
-
R Basic SyntaxR Basic Syntax
-
R Data TypesR Data Types
-
R-Data StructuresR-Data Structures
-
R-VariablesR-Variables
-
R-OperatorsR-Operators
-
R-StringsR-Strings
-
R-FunctionR-Function
-
R-ParametersR-Parameters
-
Arguments in R programmingArguments in R programming
-
R String MethodsR String Methods
-
R-Regular ExpressionsR-Regular Expressions
-
Loops in R-programmingLoops in R-programming
-
R-CSV FILESR-CSV FILES
-
Statistics in-RStatistics in-R
-
Probability in RProbability in R
-
Confidence Interval in RConfidence Interval in R
-
Hypothesis Testing in RHypothesis Testing in R
-
Correlation and Covariance in RCorrelation and Covariance in R
-
Probability Plots and Diagnostics in RProbability Plots and Diagnostics in R
-
Error Matrices in RError Matrices in R
-
Curves in R-Programming LanguageCurves in R-Programming Language
-
R-OverviewR-Overview
-
R Basic SyntaxR Basic Syntax
-
R Data TypesR Data Types
-
R-Data StructuresR-Data Structures
-
R-VariablesR-Variables
-
R-OperatorsR-Operators
-
R-StringsR-Strings
-
R-FunctionR-Function
-
R-ParametersR-Parameters
-
Arguments in R programmingArguments in R programming
-
Regular ExpressionsRegular Expressions
-
R String MethodsR String Methods
R-Data Structures
A data structure is a particular way of organizing data in a computer so that it can be used effectively. The idea is to reduce the space and time complexities of different tasks. Data structures in R programming are tools for holding multiple values.
R’s base data structures are often organized by their dimensionality (1D, 2D, or nD) and whether
they’re homogeneous (all elements must be of the identical type) or heterogeneous (the elements are often of various types). This gives rise to the six data types which are most frequently utilized in data analysis.
The most essential data structures used in R include :
- Vector
- Lists
- Dataframes
- Matrices
- Arrays
- Factors
The Different Kinds of Data Structures in R
Let’s take a little time to familiarize ourselves with the data structures in R. In the process, we can familiarize ourselves with common R functions.
Vector:
In R, a vector is a basic data structure that can hold multiple values of the same type (e.g., numeric, character, logical). Vectors are essential for storing data in R, and they can be
created using various functions.
Technically, vectors can be one of two types:
- atomic vectors
- lists
although the term “vector” most commonly refers to the atomic types not to lists.
We can categorize a vector into the below types:
- Numeric Vector (1,808,6527,742,268)
- Integer Vector ( positive and negative real numbers )
- Character vector (“a”, “efjvfVF”, “fbyvkdsb sbv”, “ffWVWVVRV”)
- Logical vector (True/False)
- Complex vector (complex numbers of a+bi form)
Here’s a detailed guide on how to use vectors in R:
1. Creating a Vector:
You can create a vector using the c() function (short for concatenate).
Example: Numeric Vector
# Creating a numeric vector numbers <- c(1, 2, 3, 4, 5)
print(numbers) # Output: 1 2 3 4 5
Example: Character Vector
# Creating a character vector
names <- c(“Aaditya”, “Dhruv”, “Shreya”)
print(names) # Output: “Aaditya” “Dhruv” “Shreya”
Example: Logical Vector
# Creating a logical vector
logical_values <- c(TRUE, FALSE, TRUE)
print(logical_values) # Output: TRUE FALSE TRUE
1. Accessing Elements in a Vector:
You can access elements of a vector using square brackets [] and specifying the index number.
Example: Accessing Elements
# Accessing the first element print(numbers[1])
# Output: 1
# Accessing multiple elements print(numbers[c(2, 4)])
# Output: 2 4
1. Modifying Elements in a Vector
You can modify vector elements by assigning new values to specific indices.
Example: Modifying Elements
# Changing the second element numbers[2] <- 10
print(numbers) # Output: 1 10 3 4 5
1. Vector Operations
You can perform arithmetic operations on vectors element-wise.
Example: Vector Addition
# Adding two vectors vector1 <- c(1, 2, 3)
vector2 <- c(4, 5, 6)
result <- vector1 + vector2 print(result) # Output: 5 7 9
Example: Multiplying by a Scalar
# Multiplying a vector by a scalar
result <- numbers * 2
print(result) # Output: 2 20 6 8 10
5. Length of a Vector
You can find the number of elements in a vector using the length() function.
Example: Length of a Vector
# Checking the length of the vector print(length(numbers)) # Output: 5
6. Filtering a Vector
You can filter a vector based on a condition
Example: Filtering Numeric Values
# Filtering values greater than 3 filtered <- numbers[numbers > 3] print(filtered) # Output: 10 4 5
7. Combining Vectors
You can combine two or more vectors using the c() function.
Example: Combining Vectors
# Combining two vectors
combined <- c(vector1, vector2)
print(combined) # Output: 1 2 3 4 5 6
8. Checking Vector Type
You can check the type of a vector using the class() function.
Example: Checking Vector Type
# Checking the type of the vector
print(class(numbers)) # Output: “numeric”
Vectors are a fundamental data structure in R, and you can create, modify, and perform operations on them easily. You can work with vectors containing numeric, character, or
logical data, and perform arithmetic, access elements, filter, and combine vectors as needed.
List:
In R, lists act as containers. Unlike atomic vectors, the contents of a list are not restricted to a single mode and can encompass any mixture of data types.
Lists are sometimes called generic vectors, because the elements of a list can be of any type of R object, even lists containing further lists. This property makes them fundamentally
different from atomic vectors.
“A list is a special type of vector in which each element can be a different type.”
Create lists using list() or coerce other objects using as.list(). An empty list of the required length can be created using vector().
Characteristics of Lists
- Heterogeneous Elements: Lists can store multiple data types, such as numeric, character, logical, etc.
- Flexible Size: Lists can hold elements of varying sizes and
- Named Elements: List elements can be named for easy
Creating a List
You create a list using the list() function. Here’s how to use and manipulate lists in R.
1. Creating a List
A list can contain different data types, such as numbers, strings, vectors, and other lists.
Example: Simple List
# Creating a list with different data types
my_list <- list(name = “Roshan”, age = 28, scores = c(85, 90, 98)) print(my_list)
Daily Life Example: Counting Items in a Store
Integers are ideal for representing things that cannot be divided into parts. For instance, the number of items in a shopping cart or the number of employees in a company must be whole numbers (integers).
Scenario: Counting Items in a Warehouse
Imagine you are managing a warehouse and need to track the stock count of different items. The number of items must be a whole number because you cannot have half of a product in stock.
stock_count <- 150L # Recording stock count as an integer print(stock_count) # Print the stock count
Output: [1] 150
print(class(stock_count)) # Print the class of stock_count
Output:
[1] “integer”
print(typeof(stock_count)) # Print the type of stock_count
Output:
- “integer”
Output:
$name
- “Roshan”
$age [1] 28
$scores
[1] 85 90 98
Explanation:
- name is a character
- age is a numeric
- scores is a numeric
2. Accessing List:
You can access elements of a list using either double square brackets [[ ]] or the $ operator for named elements.
Example: Accessing Elements by Index
# Access the first element (name)
print(my_list[[1]]) # Output: “Roshan” # Access the third element (scores)
print(my_list[[3]]) # Output: 85 90 98
3. Modifying List Elements
You can modify elements in a list by assigning new values to specific indices or names.
Example: Modifying List Elements
# Changing the age element my_list$age <- 28
print(my_list$age) # Output: 28
4. Adding Elements to a List
Adding Elements to a List
Example: Adding a New Element
# Adding a new element to the list my_list$city <- “Delhi”
print(my_list)
Output:
$name
- “Roshan”
$age [1] 28
$scores
[1] 85 90 98
$city
[1] “Delhi”
5. Removing Elements from a List
To remove an element from a list, assign it NULL.
Example: Removing an Element
# Removing the city element my_list$city <- NULL
print(my_list)
1. Length of a List
You can find the number of elements in a list using the length() function.
Example: Length of a List
# Finding the length of the list
print(length(my_list)) # Output: 3
7. Converting Lists to Vectors
Sometimes you may need to convert a list to a vector. This can be done using the unlist() function, which simplifies a list to produce a vector.
Example: Convert List to Vector
# Converting a list to a vector
vector_from_list <- unlist(my_list) print(vector_from_list)
Output:
name age scores1 scores2 scores3
“Roshan” “28” “85” “90” “98”
Lists are highly flexible and allow you to organize diverse data types in a single structure, making them essential for managing complex datasets in R.
Lists are highly flexible and allow you to organize diverse data types in a single structure, making them essential for managing complex datasets in R.
Matrices
In R, a matrix is a two-dimensional data structure used to store elements of the same type (numeric, character, or logical). Matrices are a key part of R’s ability to perform
mathematical and statistical operations efficiently. They are particularly useful for handling tabular data and performing operations like matrix multiplication, addition, or element-
wise operations.
Syntax
matrix(data, no_row, no_col, by_row, dim_name)
The basic syntax of creating a matrix is as follows:
Example:
# Daily high temperatures for a week
daily_temperatures <- c(72, 75, 78, 79, 74, 70, 68)
Characteristics of Matrices:
- Two-Dimensional Structure: Matrices have rows and
- Homogeneous Elements: All elements in a matrix must be of the same type (e.g., all numeric or all character).
- Fixed Size: Matrices have a fixed number of rows and
Creating Matrices
Matrices can be created using the matrix() function in R.
Example: Creating a Numeric Matrix
# Creating a 3×3 matrix
my_matrix <- matrix(1:9, nrow = 3, ncol = 3) print(my_matrix)
Output:
[,1] [,2] [,3]
[1,] 1 4 7
[2,] | 2 | 5 8 |
[3,] | 3 | 6 9 |
Explanation:
- 1:9 specifies the numbers to fill the
- nrow = 3 and ncol = 3 specify the number of rows and
Accessing Matrix Elements
You can access elements in a matrix using row and column indices.
Example: Accessing an Element
# Accessing the element in the 2nd row, 3rd column print(my_matrix[2, 3]) # Output: 8
Example: Accessing a Whole Row or Column
# Accessing the 1st row
print(my_matrix[1, ]) # Output: 1 4 7 # Accessing the 2nd column
print(my_matrix[, 2]) # Output: 4 5 6
Modifying Matrix Elements
You can modify elements of a matrix by specifying their position.
Example: Modifying an Element
# Changing the element in the 3rd row, 1st column my_matrix[3, 1] <- 10
print(my_matrix)
Output:
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 10 6 9
Matrix Operations
Matrices in R support various arithmetic operations like addition, subtraction, multiplication, and division, either element-wise or matrix-specific.
1. Element-Wise Operations
You can perform element-wise operations (addition, multiplication, etc.) between two matrices of the same dimensions.
# Creating another 3×3 matrix
matrix2 <- matrix(9:1, nrow = 3, ncol = 3)
# Adding two matrices element-wise result_add <- my_matrix + matrix2 print(result_add)
Output:
[,1] [,2] [,3]
[1,] | 10 | 13 | 16 |
[2,] | 11 | 10 | 11 |
[3,] | 19 | 12 | 10 |
1. Matrix Multiplication
Matrix multiplication is done using the %*% operator.
# Matrix multiplication
result_mult <- my_matrix %*% matrix2 print(result_mult)
Output:
| [,1] [,2] [,3] |
[1,] | 82 49 28 |
[2,] | 103 61 34 |
[3,] | 86 52 32 |
3. Transposing a Matrix
You can transpose a matrix (swap rows and columns) using the t() function.
# Transposing the matrix
transposed_matrix <- t(my_matrix) print(transposed_matrix)
Output:
[,1] [,2] [,3]
[1,] | 1 | 2 | 10 |
[2,] | 4 | 5 | 6 |
[3,] | 7 | 8 | 9 |
Combining Matrices
You can combine matrices by rows or columns using rbind() or cbind().
Example: Combining Matrices by Rows
# Row-binding two matrices
combined_rows <- rbind(my_matrix, matrix2) print(combined_rows)
Output:
[,1] [,2] [,3]
[1,] | 1 | 4 | 7 |
[2,] | 2 | 5 | 8 |
[3,] | 10 | 6 | 9 |
[4,] | 9 | 6 | 3 |
[5,] | 8 | 5 | 2 |
[6,] | 7 | 4 | 1 |
Example: Combining Matrices by Columns
# Column-binding two matrices
combined_columns <- cbind(my_matrix, matrix2) print(combined_columns)
Output:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] | 1 | 4 | 7 | 9 | 6 | 3 |
[2,] | 2 | 5 | 8 | 8 | 5 | 2 |
[3,] | 10 | 6 | 9 | 7 | 4 | 1 |
Applying Functions to Matrices
You can apply functions to rows or columns of a matrix using the apply() function.
Example: Applying a Function to Each Row
# Applying the sum function to each row row_sums <- apply(my_matrix, 1, sum) print(row_sums) # Output: 12 15 25
Explanation:
- apply(my_matrix, 1, sum) applies the sum function to each row (specified by 1).
Matrices in R are powerful for performing mathematical computations and handling two-dimensional data, making them essential for various types of data analysis.
Arrays
In R, an array is a data structure that can store data in more than two dimensions (such as 2D, 3D, or even higher dimensions). Arrays are generalizations of matrices, and they allow you to work with multi-dimensional data where elements must be of the same type
(numeric, character, or logical).
Characteristics of Arrays:
- Multi-dimensional: Arrays can have more than two dimensions (e.g., 2D, 3D, 4D, ).
- Homogeneous Elements: All elements in an array must be of the same
- Fixed Size: Arrays have fixed dimensions and sizes once
Creating an Array
You create an array using the array() function in R. The array() function requires the data to fill the array and the dimensions of the array.
Syntax of array() Function:
array(data, dim = c(nrow, ncol, nmatrices))
- data: The data to be stored in the
- dim: A vector specifying the dimensions (e.g., rows, columns, and matrices for a 3D array).
1. Creating a Simple Array
Example: Creating a 3D Array (2x3x2)
# Creating a 2x3x2 array
my_array <- array(1:12, dim = c(2, 3, 2)) print(my_array)
Output:
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
Explanation:
- 1:12 provides the data to fill the
- dim = c(2, 3, 2) specifies that the array has 2 rows, 3 columns, and 2 layers (3D array).
1. Accessing Elements in an Array
You can access elements in an array using square brackets [] and specifying the indices for each dimension.
Example: Accessing Elements in a 3D Array
# Accessing element in the 1st row, 2nd column, 2nd matrix (layer) print(my_array[1, 2, 2]) # Output: 9
Example: Accessing an Entire Row or Column
# Accessing the 1st row of the 1st matrix print(my_array[1, , 1]) # Output: 1 3 5
# Accessing the 2nd column of the 2nd matrix print(my_array[, 2, 2]) # Output: 9 10
3. Modifying Elements in an Array
You can modify elements in an array by assigning new values to specific indices.
Example: Modifying Elements in an Array
# Changing the element in the 2nd row, 3rd column, 1st matrix my_array[2, 3, 1] <- 100
print(my_array)
Output:
, , 1
[,1] [,2] [,3]
[1,] | 1 | 3 5 |
[2,] | 2 | 4 100 |
, , 2 |
|
|
|
[,1] [,2] [,3] | |
[1,] | 7 9 11 | |
[2,] | 8 10 12 |
1. Combining Arrays
You can combine arrays using functions like rbind() and cbind(). However, this works for arrays with dimensions that align for row or column binding.
Example: Row Binding Two Arrays
# Creating two 2D arrays
array1 <- array(1:6, dim = c(2, 3)) array2 <- array(7:12, dim = c(2, 3))
# Row binding two arrays
combined_array <- rbind(array1, array2) print(combined_array)
Output:
[,1] [,2] [,3]
[1,] | 1 | 3 | 5 |
[2,] | 2 | 4 | 6 |
[3,] | 7 | 9 | 11 |
[4,] | 8 | 10 | 12 |
1. Dimensions of an Array
You can use the dim() function to retrieve or set the dimensions of an array.
Example: Checking Dimensions
# Checking the dimensions of the array print(dim(my_array)) # Output: [1] 2 3 2
Example: Changing Dimensions
# Changing the dimensions of the array
dim(my_array) <- c(3, 2, 2) # Changes the array to 3 rows, 2 columns, 2 layers print(my_array)
Arrays in R are powerful tools for handling multi-dimensional data, making them essential for tasks that involve more complex data structures, such as mathematical modeling, image processing, and more.
Data Frames
In R, a data frame is a two-dimensional, table-like data structure that stores data in rows and columns, much like a spreadsheet or SQL table. Data frames are one of the most important and commonly used data structures in R, especially for data analysis and manipulation. Each column in a data frame can contain different types of data (numeric, character, logical, etc.), but all elements within a column must be of the same type.
Characteristics of Data Frames:
- Heterogeneous Columns: Each column can contain a different data type (e.g., numeric, character, or logical), but all values within a column must be of the same
- Tabular Structure: Data frames have rows and columns, where rows represent observations and columns represent
- Flexible Size: Data frames can grow or shrink as needed by adding or removing rows or columns.
Creating Data Frames
You can create a data frame using the data.frame() function.
Syntax of data.frame() Function:
data.frame(column1 = c(values), column2 = c(values), …)
1. Creating a Data Frame
Example: Simple Data Frame
# Creating a data frame students_df <- data.frame(
Name = c(“Roshan”, “Harish”, “Aaditya”), Age = c(27, 22, 24),
Scores = c(95, 90, 88)
)
print(students_df)
Output:
Name Age Scores
- Roshan 27 95
- Harish 22 90
- Aaditya 24 88
Explanation:
- Name is a character column storing the names of
- Age is a numeric column storing the age of
- Scores is a numeric column storing the exam scores of
1. Accessing Data in a Data Frame
You can access specific elements, rows, or columns in a data frame using square brackets [] or the $ operator for column access.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
Example: Accessing Columns by Name
# Accessing the Name column
print(students_df$Name) # Output: “Roshan” “Harish” “Aaditya
Example: Accessing Elements by Index
# Accessing the element in the 2nd row and 3rd column (Harish’s score) print(students_df[2, 3]) # Output: 90
Example: Accessing Rows or Columns
# Accessing the 1st row (Roshan’s information)
print(students_df[1, ]) # Output: Name “Roshan” Age 27 Scores 95 # Accessing the Scores column
print(students_df[, “Scores”]) # Output: 95 90 88
1. Combining Data Frames
You can combine two data frames using cbind() (column-wise) and rbind() (row-wise).
Example: Column Binding Two Data Frames
# Combining two data frames column-wise df1 <- data.frame(ID = 1:3, Age = c(27, 26, 25))
df2 <- data.frame(Name = c(“Roshan”, “Nilesh”, “Aaditya”)) combined_df <- cbind(df1, df2) print(combined_df)
Output:
ID | Age | Name |
1 1 | 27 | Roshan |
2 2 | 26 | Nilesh |
3 3 | 25 | Aaditya |
Data frames are crucial in R for handling datasets, performing data analysis, and
conducting statistical operations, making them one of the most frequently used data structures in the R programming language.
Factor
In R, a factor is a data structure used to represent categorical data. Factors are particularly useful for storing data that has a limited number of unique values, such as gender, age
groups, survey responses, or any type of categorical data. Factors store both the values and the levels (categories), making them ideal for handling and analyzing categorical data in statistical modeling.
Characteristics of Factors:
- Categorical Data Representation: Factors store categorical variables with a fixed number of unique values called levels.
- Levels: Factors store the distinct categories as levels and can be ordered or
- Efficient Storage: Factors are more efficient than character vectors for categorical data as they store categories as integers under the
- Useful for Analysis: Many statistical functions in R, such as regression, treat factors
Creating Factors
You can create a factor using the factor() function.
Syntax of factor() Function:
factor(x, levels = c(…), ordered = TRUE/FALSE)
Explanation
- x: The vector containing the values to be converted to
- levels: A vector specifying the unique
- ordered: Whether the factor is ordered (i.e., if the categories have a meaningful order).
1. Creating a Factor
Example: Simple Factor
# Creating a factor for gender
gender <- factor(c(“Male”, “Female”, “Female”, “Male”, “Female”)) print(gender)
Output:
- Male Female Female Male Female Levels: Female Male
Explanation:
- The factor gender has two levels: “Female” and “Male”.
- R automatically detects the unique categories and stores them as
3. Specifying Levels in Factors
You can explicitly specify the levels of a factor using the levels argument.
Example: Specifying Levels
# Specifying levels for education levels
education <- factor(c(“High School”, “College”, “Masters”, “PhD”, “Masters”), levels = c(“High School”, “College”, “Masters”, “PhD”))
print(education)
Output:
- High School College Masters PhD Masters Levels: High School College Masters PhD
Explanation:
- The levels argument defines the specific categories for the
2. Ordered Factors
For ordinal data (where categories have a meaningful order), you can create ordered factors by setting ordered = TRUE.
Example: Creating an Ordered Factor
# Creating an ordered factor for satisfaction levels
satisfaction <- factor(c(“Low”, “Medium”, “High”, “Medium”, “Low”), levels = c(“Low”, “Medium”, “High”), ordered = TRUE)
print(satisfaction)
Output:
- Low Medium High Medium Low Levels: Low < Medium < High
Explanation:
- ordered = TRUE makes the factor levels ordered, indicating a progression from “Low” to “High”.
4. Converting Factors to Numeric and Character
You can convert factors back to numeric or character values using as.numeric() or as.character().
Example: Converting a Factor to Numeric
# Converting satisfaction to numeric values (indices of the levels) numeric_satisfaction <- as.numeric(satisfaction)
print(numeric_satisfaction) # Output: 1 2 3 2 1
1. Factors in Data Frames
Factors are often used in data frames, especially for categorical columns.
Example: Factors in a Data Frame
# Creating a data frame with factors survey_df <- data.frame(
Respondent = c(“Roshan”, “Aaditya”, “Chalsee”, “Harish”), Satisfaction = factor(c(“High”, “Medium”, “High”,”Medium”), levels = c(“Low”, “Medium”, “High”), ordered = TRUE) )
print(survey_df)
Output:
Respondent Satisfaction
1 | Roshan | High |
2 | Aaditya | Medium |
3 | Chalsee | High |
4 | Harish | Medium |
Factors are essential in R for representing categorical data efficiently, and they play a significant role in data analysis, especially in summarizing, visualizing, and modeling categorical data.
Conclusion:
Vectors are for homogeneous one-dimensional data.
Lists can hold heterogeneous data, including other data structures.
- Matrices are for two-dimensional homogeneous data.
- Arrays are for multi-dimensional homogeneous data.
- Data Frames are for heterogeneous tabular data.
- Factors are for categorical variables with specific levels.
Each structure has its ideal use case, depending on the type and complexity of the data.