Meritshot Tutorials

  1. Home
  2. »
  3. R-Data in Structures

SQL Tutorial

R-Data Structures

A data structure is a particular way of organizing data in a computer so that it can be used effectively. The idea is to reduce the space and time complexities of different tasks. Data structures in R programming are tools for holding multiple values.

R’s base data structures are often organized by their dimensionality (1D, 2D, or nD) and whether

they’re homogeneous (all elements must be of the identical type) or heterogeneous (the elements are often of various types). This gives rise to the six data types which are most frequently utilized in data analysis.

The most essential data structures used in R include :

  • Vector
  • Lists
  • Dataframes
  • Matrices
  • Arrays
  • Factors

The Different Kinds of Data Structures in R

Let’s take a little time to familiarize ourselves with the data structures in R. In the process, we can familiarize ourselves with common R functions.

Vector:

In R, a vector is a basic data structure that can hold multiple values of the same type (e.g., numeric, character, logical). Vectors are essential for storing data in R, and they can be

created using various functions.

Technically, vectors can be one of two types:

  • atomic vectors
  • lists

although the term “vector” most commonly refers to the atomic types not to lists.

We can categorize a vector into the below types:

  • Numeric Vector (1,808,6527,742,268)
  • Integer Vector ( positive and negative real numbers )
  • Character vector (“a”, “efjvfVF”, “fbyvkdsb sbv”, “ffWVWVVRV”)
  • Logical vector (True/False)
  • Complex vector (complex numbers of a+bi form)

Here’s a detailed guide on how to use vectors in R:

1.  Creating a Vector:

You can create a vector using the c() function (short for concatenate).

Example: Numeric Vector

# Creating a numeric vector numbers <- c(1, 2, 3, 4, 5)

print(numbers) # Output: 1 2 3 4 5

Example: Character Vector

# Creating a character vector

names <- c(“Aaditya”, “Dhruv”, “Shreya”)

print(names) # Output: “Aaditya” “Dhruv” “Shreya”

Example: Logical Vector

# Creating a logical vector

logical_values <- c(TRUE, FALSE, TRUE)

print(logical_values) # Output: TRUE FALSE TRUE

1.  Accessing Elements in a Vector:

You can access elements of a vector using square brackets [] and specifying the index number.

Example: Accessing Elements

# Accessing the first element print(numbers[1])

# Output: 1

# Accessing multiple elements print(numbers[c(2, 4)])

# Output: 2 4

1.  Modifying Elements in a Vector

You can modify vector elements by assigning new values to specific indices.

Example: Modifying Elements

# Changing the second element numbers[2] <- 10

print(numbers) # Output: 1 10 3 4 5

1.  Vector Operations

You can perform arithmetic operations on vectors element-wise.

Example: Vector Addition

# Adding two vectors vector1 <- c(1, 2, 3)

vector2 <- c(4, 5, 6)

result <- vector1 + vector2 print(result) # Output: 5 7 9

Example: Multiplying by a Scalar

# Multiplying a vector by a scalar

result <- numbers * 2

print(result) # Output: 2 20 6 8 10

5.  Length of a Vector

You can find the number of elements in a vector using the length() function.

Example: Length of a Vector

# Checking the length of the vector print(length(numbers)) # Output: 5

6.  Filtering a Vector

You can filter a vector based on a condition

Example: Filtering Numeric Values

# Filtering values greater than 3 filtered <- numbers[numbers > 3] print(filtered) # Output: 10 4 5

7.  Combining Vectors

You can combine two or more vectors using the c() function.

Example: Combining Vectors

# Combining two vectors

combined <- c(vector1, vector2)

print(combined) # Output: 1 2 3 4 5 6

8.  Checking Vector Type

You can check the type of a vector using the class() function.

Example: Checking Vector Type

# Checking the type of the vector

print(class(numbers)) # Output: “numeric”

Vectors are a fundamental data structure in R, and you can create, modify, and perform operations on them easily. You can work with vectors containing numeric, character, or

logical data, and perform arithmetic, access elements, filter, and combine vectors as needed.

List:

In R, lists act as containers. Unlike atomic vectors, the contents of a list are not restricted to a single mode and can encompass any mixture of data types.

Lists are sometimes called generic vectors, because the elements of a list can be of any type of R object, even lists containing further lists. This property makes them fundamentally

different from atomic vectors.

“A list is a special type of vector in which each element can be a different type.”

Create lists using list() or coerce other objects using as.list(). An empty list of the required length can be created using vector().

Characteristics of Lists

  1. Heterogeneous Elements: Lists can store multiple data types, such as numeric, character, logical, etc.
  2. Flexible Size: Lists can hold elements of varying sizes and
  3. Named Elements: List elements can be named for easy

Creating a List

You create a list using the list() function. Here’s how to use and manipulate lists in R.

1.  Creating a List

A list can contain different data types, such as numbers, strings, vectors, and other lists.

Example: Simple List

# Creating a list with different data types

my_list <- list(name = “Roshan”, age = 28, scores = c(85, 90, 98)) print(my_list)

Daily Life Example: Counting Items in a Store

Integers are ideal for representing things that cannot be divided into parts. For instance, the number of items in a shopping cart or the number of employees in a company must be whole numbers (integers).

Scenario: Counting Items in a Warehouse

Imagine you are managing a warehouse and need to track the stock count of different items. The number of items must be a whole number because you cannot have half of a product in stock.

stock_count <- 150L # Recording stock count as an integer print(stock_count) # Print the stock count

Output: [1] 150

print(class(stock_count)) # Print the class of stock_count

Output:

[1] “integer”

print(typeof(stock_count)) # Print the type of stock_count

Output:

  • “integer”

Output:

$name

  • “Roshan”

$age [1] 28

$scores

[1] 85 90 98

Explanation:

  • name is a character
  • age is a numeric
  • scores is a numeric

2. Accessing List:

You can access elements of a list using either double square brackets [[ ]] or the $ operator for named elements.

Example: Accessing Elements by Index

# Access the first element (name)

print(my_list[[1]]) # Output: “Roshan” # Access the third element (scores)

print(my_list[[3]]) # Output: 85 90 98

3. Modifying List Elements

You can modify elements in a list by assigning new values to specific indices or names.

Example: Modifying List Elements

# Changing the age element my_list$age <- 28

print(my_list$age) # Output: 28

4. Adding Elements to a List

Adding Elements to a List

Example: Adding a New Element

# Adding a new element to the list my_list$city <- “Delhi”

print(my_list)

Output:

$name

  • “Roshan”

$age [1] 28

$scores

[1] 85 90 98

$city

[1] “Delhi”

5. Removing Elements from a List

To remove an element from a list, assign it NULL.

Example: Removing an Element

# Removing the city element my_list$city <- NULL

print(my_list)

1.  Length of a List

You can find the number of elements in a list using the length() function.

Example: Length of a List

# Finding the length of the list

print(length(my_list)) # Output: 3

7. Converting Lists to Vectors

Sometimes you may need to convert a list to a vector. This can be done using the unlist() function, which simplifies a list to produce a vector.

Example: Convert List to Vector

# Converting a list to a vector

vector_from_list <- unlist(my_list) print(vector_from_list)

Output:

name     age        scores1 scores2 scores3

“Roshan” “28” “85”     “90”       “98”

Lists are highly flexible and allow you to organize diverse data types in a single structure, making them essential for managing complex datasets in R.

Lists are highly flexible and allow you to organize diverse data types in a single structure, making them essential for managing complex datasets in R.

Matrices

In R, a matrix is a two-dimensional data structure used to store elements of the same type (numeric, character, or logical). Matrices are a key part of R’s ability to perform

mathematical and statistical operations efficiently. They are particularly useful for handling tabular data and performing operations like matrix multiplication, addition, or element-

wise operations.

Syntax

matrix(data, no_row, no_col, by_row, dim_name)

The basic syntax of creating a matrix is as follows:

Example:

# Daily high temperatures for a week

daily_temperatures <- c(72, 75, 78, 79, 74, 70, 68)

Characteristics of Matrices:

  1. Two-Dimensional Structure: Matrices have rows and
  2. Homogeneous Elements: All elements in a matrix must be of the same type (e.g., all numeric or all character).
  3. Fixed Size: Matrices have a fixed number of rows and

Creating Matrices

Matrices can be created using the matrix() function in R.

Example: Creating a Numeric Matrix

# Creating a 3×3 matrix

my_matrix <- matrix(1:9, nrow = 3, ncol = 3) print(my_matrix)

Output:

[,1] [,2] [,3]

[1,]    1      4    7

[2,]

2

5    8

[3,]

3

6    9

Explanation:

  • 1:9 specifies the numbers to fill the
  • nrow = 3 and ncol = 3 specify the number of rows and

Accessing Matrix Elements

You can access elements in a matrix using row and column indices.

Example: Accessing an Element

# Accessing the element in the 2nd row, 3rd column print(my_matrix[2, 3]) # Output: 8

Example: Accessing a Whole Row or Column

# Accessing the 1st row

print(my_matrix[1, ]) # Output: 1 4 7 # Accessing the 2nd column

print(my_matrix[, 2]) # Output: 4 5 6

Modifying Matrix Elements

You can modify elements of a matrix by specifying their position.

Example: Modifying an Element

# Changing the element in the 3rd row, 1st column my_matrix[3, 1] <- 10

print(my_matrix)

Output:

[,1] [,2] [,3]

[1,]    1     4    7

[2,]    2     5    8

[3,] 10    6    9

Matrix Operations

Matrices in R support various arithmetic operations like addition, subtraction, multiplication, and division, either element-wise or matrix-specific.

1. Element-Wise Operations

You can perform element-wise operations (addition, multiplication, etc.) between two matrices of the same dimensions.

# Creating another 3×3 matrix

matrix2 <- matrix(9:1, nrow = 3, ncol = 3)

# Adding two matrices element-wise result_add <- my_matrix + matrix2 print(result_add)

Output:

[,1] [,2] [,3]

[1,]

10

13

16

[2,]

11

10

11

[3,]

19

12

10

1.  Matrix Multiplication

Matrix multiplication is done using the %*% operator.

# Matrix multiplication

result_mult <- my_matrix %*% matrix2 print(result_mult)

Output:

 

[,1] [,2] [,3]

[1,]

82 49 28

[2,]

103 61 34

[3,]

86 52 32

3. Transposing a Matrix

You can transpose a matrix (swap rows and columns) using the t() function.

# Transposing the matrix

transposed_matrix <- t(my_matrix) print(transposed_matrix)

Output:

[,1] [,2] [,3]

[1,]

1

2

10

[2,]

4

5

6

[3,]

7

8

9

Combining Matrices

You can combine matrices by rows or columns using rbind() or cbind().

Example: Combining Matrices by Rows

# Row-binding two matrices

combined_rows <- rbind(my_matrix, matrix2) print(combined_rows)

Output:

[,1] [,2] [,3]

[1,]

1

4

7

[2,]

2

5

8

[3,]

10

6

9

[4,]

9

6

3

[5,]

8

5

2

[6,]

7

4

1

Example: Combining Matrices by Columns

# Column-binding two matrices

combined_columns <- cbind(my_matrix, matrix2) print(combined_columns)

Output:

[,1] [,2] [,3] [,4] [,5] [,6]

[1,]

1

4

7

9

6

3

[2,]

2

5

8

8

5

2

[3,]

10

6

9

7

4

1

Applying Functions to Matrices

You can apply functions to rows or columns of a matrix using the apply() function.

Example: Applying a Function to Each Row

# Applying the sum function to each row row_sums <- apply(my_matrix, 1, sum) print(row_sums) # Output: 12 15 25

Explanation:

  • apply(my_matrix, 1, sum) applies the sum function to each row (specified by 1).

Matrices in R are powerful for performing mathematical computations and handling two-dimensional data, making them essential for various types of data analysis.

Arrays

In R, an array is a data structure that can store data in more than two dimensions (such as 2D, 3D, or even higher dimensions). Arrays are generalizations of matrices, and they allow you to work with multi-dimensional data where elements must be of the same type

(numeric, character, or logical).

Characteristics of Arrays:

  1. Multi-dimensional: Arrays can have more than two dimensions (e.g., 2D, 3D, 4D, ).
  2. Homogeneous Elements: All elements in an array must be of the same
  3. Fixed Size: Arrays have fixed dimensions and sizes once

Creating an Array

You create an array using the array() function in R. The array() function requires the data to fill the array and the dimensions of the array.

Syntax of array() Function:

array(data, dim = c(nrow, ncol, nmatrices))

  • data: The data to be stored in the
  • dim: A vector specifying the dimensions (e.g., rows, columns, and matrices for a 3D array).

1.  Creating a Simple Array

Example: Creating a 3D Array (2x3x2)

# Creating a 2x3x2 array

my_array <- array(1:12, dim = c(2, 3, 2)) print(my_array)

Output:

, , 1

[,1] [,2] [,3]

[1,]    1      3    5

[2,]    2      4    6

, , 2

[,1] [,2] [,3]

[1,]    7    9 11

[2,]    8 10 12

Explanation:

  • 1:12 provides the data to fill the
  • dim = c(2, 3, 2) specifies that the array has 2 rows, 3 columns, and 2 layers (3D array).

1.  Accessing Elements in an Array

You can access elements in an array using square brackets [] and specifying the indices for each dimension.

Example: Accessing Elements in a 3D Array

# Accessing element in the 1st row, 2nd column, 2nd matrix (layer) print(my_array[1, 2, 2]) # Output: 9

Example: Accessing an Entire Row or Column

# Accessing the 1st row of the 1st matrix print(my_array[1, , 1]) # Output: 1 3 5

# Accessing the 2nd column of the 2nd matrix print(my_array[, 2, 2]) # Output: 9 10

3.  Modifying Elements in an Array

You can modify elements in an array by assigning new values to specific indices.

Example: Modifying Elements in an Array

# Changing the element in the 2nd row, 3rd column, 1st matrix my_array[2, 3, 1] <- 100

print(my_array)

Output:

, , 1

[,1] [,2] [,3]

[1,]

1

3    5

[2,]

2

4 100

 

, , 2

 

 

 

 

 

[,1] [,2] [,3]

[1,]

7    9 11

[2,]

8    10 12

1.  Combining Arrays

You can combine arrays using functions like rbind() and cbind(). However, this works for arrays with dimensions that align for row or column binding.

Example: Row Binding Two Arrays

# Creating two 2D arrays

array1 <- array(1:6, dim = c(2, 3)) array2 <- array(7:12, dim = c(2, 3))

# Row binding two arrays

combined_array <- rbind(array1, array2) print(combined_array)

Output:

[,1] [,2] [,3]

[1,]

1

3

5

[2,]

2

4

6

[3,]

7

9

11

[4,]

8

10

12

1.  Dimensions of an Array

You can use the dim() function to retrieve or set the dimensions of an array.

Example: Checking Dimensions

# Checking the dimensions of the array print(dim(my_array)) # Output: [1] 2 3 2

Example: Changing Dimensions

# Changing the dimensions of the array

dim(my_array) <- c(3, 2, 2) # Changes the array to 3 rows, 2 columns, 2 layers print(my_array)

Arrays in R are powerful tools for handling multi-dimensional data, making them essential for tasks that involve more complex data structures, such as mathematical modeling, image processing, and more.

Data Frames

In R, a data frame is a two-dimensional, table-like data structure that stores data in rows and columns, much like a spreadsheet or SQL table. Data frames are one of the most important and commonly used data structures in R, especially for data analysis and manipulation. Each column in a data frame can contain different types of data (numeric, character, logical, etc.), but all elements within a column must be of the same type.

Characteristics of Data Frames:

  1. Heterogeneous Columns: Each column can contain a different data type (e.g., numeric, character, or logical), but all values within a column must be of the same
  2. Tabular Structure: Data frames have rows and columns, where rows represent observations and columns represent
  3. Flexible Size: Data frames can grow or shrink as needed by adding or removing rows or columns.

Creating Data Frames

You can create a data frame using the data.frame() function.

Syntax of data.frame() Function:

data.frame(column1 = c(values), column2 = c(values), …)

1.  Creating a Data Frame

Example: Simple Data Frame

# Creating a data frame students_df <- data.frame(

Name = c(“Roshan”, “Harish”, “Aaditya”), Age = c(27, 22, 24),

Scores = c(95, 90, 88)

)

print(students_df)

Output:

Name   Age Scores

  • Roshan 27 95
  • Harish 22     90
  • Aaditya 24 88

Explanation:

  • Name is a character column storing the names of
  • Age is a numeric column storing the age of
  • Scores is a numeric column storing the exam scores of

1.  Accessing Data in a Data Frame

You can access specific elements, rows, or columns in a data frame using square brackets [] or the $ operator for column access.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Example: Accessing Columns by Name

# Accessing the Name column

print(students_df$Name) # Output: “Roshan” “Harish” “Aaditya

Example: Accessing Elements by Index

# Accessing the element in the 2nd row and 3rd column (Harish’s score) print(students_df[2, 3]) # Output: 90

Example: Accessing Rows or Columns

# Accessing the 1st row (Roshan’s information)

print(students_df[1, ]) # Output: Name “Roshan” Age 27 Scores 95 # Accessing the Scores column

print(students_df[, “Scores”]) # Output: 95 90 88

1.  Combining Data Frames

You can combine two data frames using cbind() (column-wise) and rbind() (row-wise).

Example: Column Binding Two Data Frames

# Combining two data frames column-wise df1 <- data.frame(ID = 1:3, Age = c(27, 26, 25))

df2 <- data.frame(Name = c(“Roshan”, “Nilesh”, “Aaditya”)) combined_df <- cbind(df1, df2) print(combined_df)

Output:

ID

Age

Name

1    1

27

Roshan

2    2

26

Nilesh

3    3

25

Aaditya

Data frames are crucial in R for handling datasets, performing data analysis, and

conducting statistical operations, making them one of the most frequently used data structures in the R programming language.

Factor

In R, a factor is a data structure used to represent categorical data. Factors are particularly useful for storing data that has a limited number of unique values, such as gender, age

groups, survey responses, or any type of categorical data. Factors store both the values and the levels (categories), making them ideal for handling and analyzing categorical data in statistical modeling.

Characteristics of Factors:

  1. Categorical Data Representation: Factors store categorical variables with a fixed number of unique values called levels.
  2. Levels: Factors store the distinct categories as levels and can be ordered or
  3. Efficient Storage: Factors are more efficient than character vectors for categorical data as they store categories as integers under the
  4. Useful for Analysis: Many statistical functions in R, such as regression, treat factors

Creating Factors

You can create a factor using the factor() function.

Syntax of factor() Function:

factor(x, levels = c(…), ordered = TRUE/FALSE)

Explanation

  • x: The vector containing the values to be converted to
  • levels: A vector specifying the unique
  • ordered: Whether the factor is ordered (i.e., if the categories have a meaningful order).

1.  Creating a Factor

Example: Simple Factor

# Creating a factor for gender

gender <- factor(c(“Male”, “Female”, “Female”, “Male”, “Female”)) print(gender)

Output:

  • Male Female Female Male Female Levels: Female Male

Explanation:

  • The factor gender has two levels: “Female” and “Male”.
  • R automatically detects the unique categories and stores them as

3.  Specifying Levels in Factors

You can explicitly specify the levels of a factor using the levels argument.

Example: Specifying Levels

# Specifying levels for education levels

education <- factor(c(“High School”, “College”, “Masters”, “PhD”, “Masters”), levels = c(“High School”, “College”, “Masters”, “PhD”))

print(education)

Output:

  • High School College Masters    PhD        Masters Levels: High School College Masters PhD

Explanation:

  • The levels argument defines the specific categories for the

2.  Ordered Factors

For ordinal data (where categories have a meaningful order), you can create ordered factors by setting ordered = TRUE.

Example: Creating an Ordered Factor

# Creating an ordered factor for satisfaction levels

satisfaction <- factor(c(“Low”, “Medium”, “High”, “Medium”, “Low”), levels = c(“Low”, “Medium”, “High”), ordered = TRUE)

print(satisfaction)

Output:

  • Low Medium High Medium Low Levels: Low < Medium < High

Explanation:

  • ordered = TRUE makes the factor levels ordered, indicating a progression from “Low” to “High”.

4.  Converting Factors to Numeric and Character

You can convert factors back to numeric or character values using as.numeric() or as.character().

Example: Converting a Factor to Numeric

# Converting satisfaction to numeric values (indices of the levels) numeric_satisfaction <- as.numeric(satisfaction)

print(numeric_satisfaction) # Output: 1 2 3 2 1

1.  Factors in Data Frames

Factors are often used in data frames, especially for categorical columns.

Example: Factors in a Data Frame

# Creating a data frame with factors survey_df <- data.frame(

Respondent = c(“Roshan”, “Aaditya”, “Chalsee”, “Harish”), Satisfaction = factor(c(“High”, “Medium”, “High”,”Medium”), levels = c(“Low”, “Medium”, “High”), ordered = TRUE) )

print(survey_df)

Output:

Respondent Satisfaction

1

Roshan

High

2

Aaditya

Medium

3

Chalsee

High

4

Harish

Medium

Factors are essential in R for representing categorical data efficiently, and they play a significant role in data analysis, especially in summarizing, visualizing, and modeling categorical data.

Conclusion:
  • Vectors are for homogeneous one-dimensional data.

  • Lists can hold heterogeneous data, including other data structures.

  • Matrices are for two-dimensional homogeneous data.
  • Arrays are for multi-dimensional homogeneous data.
  • Data Frames are for heterogeneous tabular data.
  • Factors are for categorical variables with specific levels.

Each structure has its ideal use case, depending on the type and complexity of the data.