Meritshot Tutorials
- Home
- »
- Probability in R
SQL Tutorial
-
R-OverviewR-Overview
-
R Basic SyntaxR Basic Syntax
-
R Data TypesR Data Types
-
R-Data StructuresR-Data Structures
-
R-VariablesR-Variables
-
R-OperatorsR-Operators
-
R-StringsR-Strings
-
R-FunctionR-Function
-
R-ParametersR-Parameters
-
Arguments in R programmingArguments in R programming
-
R String MethodsR String Methods
-
R-Regular ExpressionsR-Regular Expressions
-
Loops in R-programmingLoops in R-programming
-
R-CSV FILESR-CSV FILES
-
Statistics in-RStatistics in-R
-
Probability in RProbability in R
-
Confidence Interval in RConfidence Interval in R
-
Hypothesis Testing in RHypothesis Testing in R
-
Correlation and Covariance in RCorrelation and Covariance in R
-
Probability Plots and Diagnostics in RProbability Plots and Diagnostics in R
-
Error Matrices in RError Matrices in R
-
Curves in R-Programming LanguageCurves in R-Programming Language
Probability in R
In R programming, probability is handled using a variety of functions to calculate probabilities, simulate random variables, and analyze distributions.In probability theory, distributions can be broadly classified into discrete distributions and continuous distributions based on the type of
outcomes they represent. Here are some essential topics and functions for working with probability in R:
Mathematical Foundations
- Sample Space: The sample space represents the set of all possible outcomes in a given experiment. It serves as the foundation for calculating For instance, when rolling a fair six-sided die, the sample space is {1, 2, 3, 4, 5, 6}.
- Events: An event is a subset of the sample space, representing a specific outcome or set of outcomes. Events can range from simple, such as rolling an even number, to complex, like drawing a red card from a deck.
- Probability Distribution: A probability distribution assigns probabilities to each event in the sample For classical probability, all outcomes are equally likely, so each event has the same probability.
Calculating Probabilities in R
R offers various functions and packages for calculating Probability in R and performing statistical analyses. Some commonly used functions include:
- dbinom(): Computes the probability mass function (PMF) for the binomial
- pnorm(): Calculates the cumulative distribution function (CDF) for the normal
- dpois(): Computes the PMF for the Poisson
- punif(): Calculates the CDF for the uniform
Here is the basic example of calculating Probability in R:
# Define the sample space
sample_space <- c(1, 2, 3, 4, 5, 6)
# Define an event, for example, rolling an even number event <- c(2, 4, 6)
# Calculate the probability of the event
probability <- length(event) / length(sample_space) print(probability)
Output= 0.5
- Discrete Distributions
A discrete distribution describes situations where the set of possible outcomes is countable, typically integers (like 0, 1, 2, 3, etc.). These outcomes can occur with certain probabilities.
Key Discrete Distributions:
- Binomial Distribution
- Poisson Distribution
- Geometric Distribution Characteristics of Discrete Distributions:
- Probability is assigned to specific
- The sum of probabilities of all possible outcomes equals
- Each outcome has a specific probability
Binomial Distribution (Discrete)
A binomial distribution models the number of successes in a fixed number of independent trials, where each trial has two possible outcomes (success/failure).
Real-Time Example:
Flipping a Coin: Consider a Diwali game where you flip a coin 5 times, and you want to know the probability of getting exactly 3 heads, assuming the coin is fair (probability of heads = 0.5).
# Probability of getting exactly 3 heads in 5 flips
dbinom(3, size = 5, prob = 0.5)
Contextual Example:
Imagine playing a local street game where you flip a rupee coin multiple times and try to predict the number of heads. This can be modeled using a binomial distribution.
Poisson Distribution (Discrete)
The Poisson distribution is used for modeling the number of events that happen in a fixed interval of time or space, where events occur independently, and the average rate is constant.
Real-Time Example:
Customer Arrivals at an Indian Sweet Shop: During festive seasons like Diwali, let’s say a sweet shop receives an average of 10 customers per hour. You want to know the probability that 15 customers will arrive in the next hour.
# Probability that 15 customers arrive in the next hour
dpois(15, lambda = 10)
Contextual Example:
The Poisson distribution can be used to model the number of customers arriving at a crowded Lassi stall in Varanasi or a street vendor selling Golgappas in Delhi.
Geometric Distribution (Discrete)
The geometric distribution models the number of trials until the first success in a series of independent trials, where each trial has the same probability of success.
Real-Time Example:
First Success in Drawing a Chit: In a local lottery draw, each chit has a 10% chance of winning. You want to know the probability that you will win on your 3rd try.
# Probability of winning on the 3rd try
dgeom(2, prob = 0.1) # ‘2’ represents two failures before the first success
Contextual Example:
You can use the geometric distribution to model how many times you need to pull a chit in a local community lottery before you win, considering the low probability of winning.
2. Continuous Distributions
A continuous distribution describes situations where the set of possible outcomes is uncountable, typically real numbers (like 1.5, 2.7, 3.14, etc.). The probability of any single exact value is zero; instead, probabilities are assigned to intervals of values.
Key Continuous Distributions:
- Normal Distribution
- Uniform Distribution
- Exponential Distribution Characteristics of Continuous Distributions:
- Probabilities are assigned to ranges of values rather than specific
- The probability of any single specific outcome is
- The total area under the probability density function (PDF) curve is
Normal Distribution (Continuous)
The normal distribution is the most common continuous probability distribution, often referred to as the “bell curve.” It describes many natural phenomena, such as exam scores or heights.
Real-Time Example:
Height of People in India: The heights of adult men in India can be modeled using a normal distribution. Assume the average height is 167 cm with a standard deviation of 10 cm. You want to know the probability that a randomly selected man is taller than 180 cm.
# Probability that a randomly selected man is taller than 180 cm
1 – pnorm(180, mean = 167, sd = 10)
Contextual Example:
Imagine measuring the heights of men during a recruitment drive for the Indian Army. You can use the normal distribution to predict how many people will exceed a certain height requirement.
Uniform Distribution (Continuous)
In a uniform distribution, all outcomes in a specified range are equally likely. The probability density function (PDF) is constant over the range.
Real-Time Example:
Selecting a Random Time: You are randomly selecting a time between 12:00 PM and 1:00 PM to place an online order during a Flash Sale. The chance of selecting any time within this interval is uniform.
# Probability of selecting a time between 12:30 PM and 12:45 PM
punif(45, min = 0, max = 60) – punif(30, min = 0, max = 60)
Contextual Example:
This could be modeled for someone shopping during a Flash Sale on Flipkart or Amazon India, where they are equally likely to place their order at any time within the sale window.
Exponential Distribution (Continuous)
The exponential distribution is used to model the time between events in a Poisson process, where events occur continuously and independently at a constant average rate.
Real-Time Example:
Waiting Time for an Auto Rickshaw: The time between the arrival of auto-rickshaws at a particular stand in Bengaluru can be modeled using an exponential distribution. If the average waiting time is 5 minutes, what’s the probability that you will wait more than 10 minutes?
# Probability that waiting time is more than 10 minutes
pexp(10, rate = 1/5, lower.tail = FALSE)
Contextual Example:
In a busy Indian city like Mumbai or Bengaluru, where people often wait for auto-rickshaws, you can use the exponential distribution to estimate waiting times between consecutive auto arrivals.
3. Conditional probability
It refers to the probability of an event occurring given that another event has already occurred. Mathematically, it’s expressed as:
P(A∣B)=P(A∩B)/P(B)
Where:
- P(A∣B)P(A∣B) is the conditional probability of event AA occurring given that BB has
- P(A∩B)P(A∩B) is the probability that both events AA and BB
- P(B)P(B) is the probability of event BB
Real-Time Example: Indian Context
Consider the following situation:
- Event A: It is raining in
- Event B: There is heavy traffic in Suppose you know:
- P(A)P(A), the probability that it rains in Delhi on any given day, is 3.
- P(B)P(B), the probability that there is heavy traffic in Delhi, is 6.
- P(A∩B)P(A∩B), the probability that it rains and there is traffic on the same day, is 2.
Now, you want to find the probability that it is raining given that there is heavy traffic.
Calculation in R:
You can compute the conditional probability using basic arithmetic in R.
# Given probabilities
P_A <- 0.3 # Probability of rain P_B <- 0.6 # Probability of traffic
P_A_and_B <- 0.2 # Probability of rain and traffic # Conditional probability P(A|B)
P_A_given_B <- P_A_and_B / P_B P_A_given_B
Output:
This will return the conditional probability that it is raining given that there is heavy traffic.
Simulating Conditional Probability
You can simulate this scenario to better understand the concept by generating a random sample of data.
Example:
Let’s simulate 1000 days of weather and traffic in Delhi, then estimate the conditional probability from this simulation.
set.seed(123) # For reproducibility
# Simulate rain and traffic (1 = event occurs, 0 = no event) rain <- rbinom(1000, 1, 0.3) # Probability of rain = 0.3
traffic <- rbinom(1000, 1, 0.6) # Probability of traffic = 0.6 # Probability of rain given traffic
rain_and_traffic <- sum(rain == 1 & traffic == 1) # Both events occur total_traffic <- sum(traffic == 1) # Total number of days with traffic # Conditional probability estimate
P_A_given_B_simulation <- rain_and_traffic / total_traffic P_A_given_B_simulation
Real-Time Contextual Example:
This example can be used in a real-world Indian city like Delhi, where during the monsoon season, rain and traffic jams are highly correlated. By using conditional probability, you can estimate how
likely it is to rain if you are experiencing heavy traffic on your way to work.
Bayes’ Theorem with Conditional Probability
If you want to apply Bayes’ Theorem in a similar context, the formula becomes useful when you need to update probabilities based on new information.
For example, if you know the likelihood of traffic given rain and the prior probability of rain, you can calculate the probability that it’s raining given heavy traffic (the reverse of the earlier conditional probability).
Example with Bayes' Theorem:
Given:
- P(B∣A)P(B∣A): Probability of traffic given rain = 8
- P(A)P(A): Probability of rain = 3
- P(B)P(B): Probability of traffic = 6
Calculate the conditional probability of rain given traffic:
P_B_given_A <- 0.8 # Probability of traffic given rain P_A <- 0.3 # Probability of rain
P_B <- 0.6 # Probability of traffic # Bayes’ theorem: P(A|B)
P_A_given_B_bayes <- (P_B_given_A * P_A) / P_B P_A_given_B_bayes
This formula calculates the reverse probability—how likely it is to rain given that there is traffic.