Search Knowledge

© 2026 LIBREUNI PROJECT

Probability Theory

Probability theory is the mathematical framework for quantifying uncertainty. In its modern formulation, established by Andrey Kolmogorov in 1933, probability is rooted in measure theory, providing a rigorous foundation for statistical inference, stochastic processes, and information theory.

The Probability Space

A formal probability model is defined by a triplet (Ω,F,P)(\Omega, \mathcal{F}, P), known as a probability space. Each component of this triplet serves a distinct mathematical purpose in capturing the structure of random phenomena.

The sample space Ω\Omega is a non-empty set containing all possible outcomes of an experiment. An element ωΩ\omega \in \Omega represents a single, highly specific outcome.

The event space F\mathcal{F} is a σ\sigma-algebra on Ω\Omega. A collection of subsets F2Ω\mathcal{F} \subseteq 2^\Omega is a σ\sigma-algebra if it satisfies three conditions:

  1. ΩF\Omega \in \mathcal{F}.
  2. If AFA \in \mathcal{F}, then its complement AcFA^c \in \mathcal{F}.
  3. If A1,A2,FA_1, A_2, \dots \in \mathcal{F}, then their countable union i=1AiF\bigcup_{i=1}^\infty A_i \in \mathcal{F}.

Elements of F\mathcal{F} are called events. The restriction to a σ\sigma-algebra (rather than the entire power set 2Ω2^\Omega) is mathematically necessary when dealing with uncountably infinite sample spaces, such as the real line R\mathbb{R}, to avoid paradoxes associated with non-measurable sets (e.g., the Banach-Tarski paradox).

The probability measure PP is a function P:F[0,1]P: \mathcal{F} \to [0, 1] satisfying Kolmogorov’s axioms:

  1. Non-negativity: P(A)0P(A) \ge 0 for all AFA \in \mathcal{F}.
  2. Unit measure: P(Ω)=1P(\Omega) = 1.
  3. Countable additivity: For any countable sequence of pairwise disjoint events A1,A2,A_1, A_2, \dots (where AiAj=A_i \cap A_j = \emptyset for iji \neq j), P(i=1Ai)=i=1P(Ai)P\left(\bigcup_{i=1}^\infty A_i\right) = \sum_{i=1}^\infty P(A_i)

From these axioms, foundational properties emerge seamlessly. For example, the probability of the empty set must be 00. Since Ω\Omega and \emptyset are disjoint and Ω=Ω\Omega \cup \emptyset = \Omega, we have P(Ω)=P(Ω)+P()    1=1+P()    P()=0P(\Omega) = P(\Omega) + P(\emptyset) \implies 1 = 1 + P(\emptyset) \implies P(\emptyset) = 0.

Which of the following is NOT required for a collection of subsets to form a \sigma-algebra?

Independence and Conditional Probability

Two events AA and BB are independent if the occurrence of one does not alter the probability of the other. Mathematically, this is defined as: P(AB)=P(A)P(B)P(A \cap B) = P(A)P(B)

When events are not independent, partial information changes our uncertainty. The conditional probability of an event AA given that event BB has occurred (with P(B)>0P(B) > 0) is defined as: P(AB)=P(AB)P(B)P(A \mid B) = \frac{P(A \cap B)}{P(B)}

Rearranging this definition yields the multiplication rule P(AB)=P(AB)P(B)P(A \cap B) = P(A \mid B)P(B). This straightforward algebraic manipulation leads to Bayes’ Theorem, a foundational result tying forward and inverse probabilities: P(AB)=P(BA)P(A)P(B)P(A \mid B) = \frac{P(B \mid A)P(A)}{P(B)}

The denominator P(B)P(B) is often expanded using the Law of Total Probability. For a partition A1,A2,,AnA_1, A_2, \dots, A_n of the sample space Ω\Omega, we have: P(B)=i=1nP(BAi)P(Ai)P(B) = \sum_{i=1}^n P(B \mid A_i)P(A_i)

Medical Testing Accuracy

A disease affects 1% of a population. A diagnostic test correctly identifies the disease 99% of the time when a patient is infected (true positive). However, it also incorrectly indicates disease 5% of the time for healthy patients (false positive). A randomly selected individual tests positive.

Using Bayes' Theorem, what is the exact probability that the individual actually has the disease?

Random Variables and Distributions

A random variable is not a variable, nor is it inherently random. It is a deterministic function X:ΩRX: \Omega \to \mathbb{R} that maps outcomes to real numbers. Crucially, XX must be a measurable function. This means that for any Borel set BRB \subseteq \mathbb{R}, its preimage must be an event in our σ\sigma-algebra: X1(B)={ωΩ:X(ω)B}FX^{-1}(B) = \{ \omega \in \Omega : X(\omega) \in B \} \in \mathcal{F}

The probability distribution of XX is completely determined by its Cumulative Distribution Function (CDF), FX(x)F_X(x), defined as: FX(x)=P(Xx)=P({ωΩ:X(ω)x})F_X(x) = P(X \le x) = P(\{ \omega \in \Omega : X(\omega) \le x \}) Every valid CDF is right-continuous, monotonically non-decreasing, with limxFX(x)=0\lim_{x \to -\infty} F_X(x) = 0 and limxFX(x)=1\lim_{x \to \infty} F_X(x) = 1.

Discrete vs. Continuous Distributions

A random variable is discrete if it takes values in a countable set. It is described by a Probability Mass Function (PMF) pX(x)=P(X=x)p_X(x) = P(X = x). A random variable is continuous if there exists a non-negative Lebesgue-integrable function fX(x)f_X(x), called the Probability Density Function (PDF), such that: FX(x)=xfX(t)dtF_X(x) = \int_{-\infty}^{x} f_X(t) \, dt For continuous variables, the probability of any single precise point is strictly zero: P(X=x)=0P(X=x) = 0. Probabilities are only assigned to intervals.

Which of the following statements about the Cumulative Distribution Function (CDF) is always mathematically accurate for any random variable?

Expected Value: The Lebesgue Perspective

The expected value E[X]\mathbb{E}[X] of a random variable is the probability-weighted average of all its possible values. In an elementary context, it is formulated as a sum for discrete variables xip(xi)\sum x_i p(x_i) and a Riemann integral for continuous variables xf(x)dx\int x f(x) dx.

A more unified, rigorous approach utilizes the Lebesgue integral over the probability space: E[X]=ΩX(ω)dP(ω)\mathbb{E}[X] = \int_{\Omega} X(\omega) \, dP(\omega) This single definition naturally covers discrete, continuous, and mixed random variables, treating probability distributions simply as specific measures.

The expected value possesses the critical property of linearity. For any random variables XX and YY, and constants a,bRa, b \in \mathbb{R}: E[aX+bY]=aE[X]+bE[Y]\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y] Linearity holds identically whether XX and YY are independent or heavily correlated.

Variance and Moments

To quantify the dispersion or spread of a probability distribution around its center, we examine the second central moment, the variance: Var(X)=E[(XE[X])2]=E[X2](E[X])2\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 The variance strictly requires that E[X2]\mathbb{E}[X^2] (the second moment) is finite. Unlike expectation, variance is not a linear operator. For constants a,ba, b: Var(aX+b)=a2Var(X)\text{Var}(aX + b) = a^2 \text{Var}(X) For the sum of two random variables, the variance is given by: Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y) If XX and YY are independent, their covariance Cov(X,Y)\text{Cov}(X, Y) is zero, rendering the variance strictly additive.

Linear Transformations of Portfolios

A quantitative analyst models the daily return of two technology stocks, A and B. Both stocks have an expected daily return of 2% and a standard deviation of 4%. The stocks are perfectly uncorrelated. The analyst constructs a portfolio that heavily weights stock A: they hold $3 worth of Stock A and -$1 worth of Stock B (a short position) to hedge.

What is the variance of the daily return of this portfolio P = 3A - 1B?

Limits and Asymptotic Theorems

The utility of a single measure or expectation dramatically extrapolates as we consider sequences of random variables X1,X2,X_1, X_2, \dots Often, we are concerned with sums of independent and identically distributed (i.i.d.) random variables.

Two foundational theorems act as the bedrock for modern statistics.

  1. Law of Large Numbers (LLN): Let X1,X2,,XnX_1, X_2, \dots, X_n be an i.i.d. sequence of random variables with finite expectation μ\mu. The sample average Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i converges to the expected value μ\mu. The Strong Law ensures almost sure convergence (Pr(limnXˉn=μ)=1\Pr(\lim_{n \to \infty} \bar{X}_n = \mu) = 1), whereas the Weak Law guarantees convergence in probability.

  2. Central Limit Theorem (CLT): If the sequence also possesses a finite variance σ2>0\sigma^2 > 0, the standardized sample average converges in distribution to the standard normal distribution N(0,1)\mathcal{N}(0,1): limnP(Xˉnμσ/nz)=Φ(z)\lim_{n \to \infty} P \left( \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \le z \right) = \Phi(z) where Φ(z)\Phi(z) is the CDF of the standard normal distribution.

The sheer power of the CLT stems from a distinct lack of distributional assumptions: regardless of whether the original variable XX is discrete, highly skewed, or uniform, the aggregate behavior of sums mathematically mandates a metamorphosis into the bell curve, underpinning almost all large-scale modeling and parametric tests.

What is the primary condition required by the Central Limit Theorem for the sample average sequence to converge to a normal distribution?