Markov Chains

A Markov Chain is a mathematical system that undergoes transitions from one state to another on a state space. It is a stochastic process characterized by the Markov property: the conditional probability distribution of future states of the process depends only upon the present state, not on the sequence of events that preceded it.

Formally, a stochastic process $\{X_n : n \in \mathbb{N}_0\}$ is a Markov chain if, for all $n \ge 0$ and any sequence of states $i_0, i_1, \dots, i_{n-1}, i, j$ , the following equality holds:

$\mathbb{P}(X_{n+1} = j \mid X_n = i, X_{n-1} = i_{n-1}, \dots, X_0 = i_0) = \mathbb{P}(X_{n+1} = j \mid X_n = i)$

This fundamental property states that the entire history of the process is encapsulated in its current state $X_n=i$ . This drastically simplifies the study of complex systems, reducing an infinite-dimensional dependency into a single-step conditional probability. Discrete and continuous-time variants form the backbone of modern stochastic modeling, encompassing applications ranging from simple queuing systems to complex financial models and molecular dynamics.

Discrete-Time Markov Chains (DTMC)

A Discrete-Time Markov Chain operates with a discrete time parameter $n \in \{0, 1, 2, \dots\}$ . The set of possible values for the random variables $X_n$ forms a countable set $S$ , called the state space. The probability of moving from state $i$ to state $j$ in one time step is given by the transition probability $p_{ij}$ , defined as:

$p_{ij} = \mathbb{P}(X_{n+1} = j \mid X_n = i)$

When these transition probabilities are independent of the time step $n$ , the Markov chain is said to be time-homogeneous. We will strictly focus on time-homogeneous chains, as their structure permits robust long-term behavioral analysis.

Transition Matrices

For a state space containing a finite number of states (or countably infinite), the one-step transition probabilities $p_{ij}$ are arranged in a matrix $P$ , called the transition matrix:

$P = \begin{pmatrix} p_{00} & p_{01} & p_{02} & \cdots \\ p_{10} & p_{11} & p_{12} & \cdots \\ \vdots & \vdots & \vdots & \ddots \end{pmatrix}$

This matrix has two vital properties:

$p_{ij} \ge 0$ for all $i, j \in S$ .
$\sum_{j \in S} p_{ij} = 1$ for all $i \in S$ .

Every row describes a probability distribution, making $P$ a stochastic matrix. If the initial distribution of the chain is a row vector $\pi^{(0)}$ (where $\pi_i^{(0)} = \mathbb{P}(X_0 = i)$ ), the distribution after one step is $\pi^{(1)} = \pi^{(0)} P$ . By induction, the probability distribution of the state after $n$ steps is given by $\pi^{(n)} = \pi^{(0)} P^n$ . The matrix multiplication organically computes the sum over all possible paths of length $n$ between any two states, weighting each path by its probability.

$n$ -Step Transition Probabilities

The $n$ -step transition probability is the probability that a process currently in state $i$ will be in state $j$ exactly $n$ steps later:

$p_{ij}^{(n)} = \mathbb{P}(X_{n+k} = j \mid X_k = i)$

For $n=1$ , $p_{ij}^{(1)} = p_{ij}$ . For $n=0$ , $p_{ij}^{(0)}$ is $1$ if $i=j$ and $0$ otherwise.

If the transition matrix $P$ of a 3-state Markov chain has row sums of 1, what must be true about the row sums of $P^2$?

Chapman-Kolmogorov Equations

The computation of $n$ -step transition probabilities is fundamentally governed by the Chapman-Kolmogorov equations. These equations provide a rigorous method for computing the probability of moving from state $i$ to state $j$ in $n+m$ steps by conditioning on the intermediate state $k$ attained after $n$ steps:

$p_{ij}^{(n+m)} = \sum_{k \in S} p_{ik}^{(n)} p_{kj}^{(m)}$

In matrix notation, this corresponds exactly to the multiplication of powers of the transition matrix: Let $P^{(n)}$ be the matrix whose entries are $p_{ij}^{(n)}$ . Then $P^{(n+m)} = P^{(n)} P^{(m)}$ . Consequently, $P^{(n)} = P^n$ . The equation elegantly states that the transition matrix for $n$ steps is the $n$ -th power of the 1-step transition matrix.

Classification of States

The long-term behavior of a Markov chain is heavily dependent on the communication structure and the topological arrangement of its state space.

Accessibility and Communication

State $j$ is accessible from state $i$ (denoted $i \to j$ ) if there exists an integer $n \ge 0$ such that $p_{ij}^{(n)} > 0$ . Simply put, there is a path of non-zero probability from $i$ to $j$ .
States $i$ and $j$ communicate (denoted $i \leftrightarrow j$ ) if $i \to j$ and $j \to i$ .

Communication is an equivalence relation (it is reflexive, symmetric, and transitive), which partitions the state space into disjoint communication classes. If a Markov chain has only one communication class—meaning every state is accessible from every other state—it is called irreducible.

Recurrent and Transient States

Let $f_{ij}^{(n)}$ denote the probability that the first transition into state $j$ (starting from $i$ ) occurs exactly at step $n$ : $f_{ij}^{(n)} = \mathbb{P}(X_n = j, X_k \neq j \text{ for } k = 1, \dots, n-1 \mid X_0 = i)$

Let $f_{ij} = \sum_{n=1}^\infty f_{ij}^{(n)}$ be the probability of ever reaching state $j$ given that the chain started in state $i$ . The parameter $f_{ii}$ is therefore the probability of ever returning to state $i$ given that the chain started in state $i$ .

A state $i$ is recurrent if $f_{ii} = 1$ . A recurrent state will be visited infinitely many times with probability $1$ .
A state $i$ is transient if $f_{ii} < 1$ . A transient state will be visited only a finite number of times with probability $1$ .

A state is recurrent if and only if the expected number of returns to that state is infinite: $\sum_{n=1}^\infty p_{ii}^{(n)} = \infty$ . It is transient if and only if $\sum_{n=1}^\infty p_{ii}^{(n)} < \infty$ . Every finite Markov chain has at least one recurrent state, though an infinite state space may consist entirely of transient states (e.g., a simple random walk on $\mathbb{Z}^3$ ).

Periodicity

The period $d(i)$ of a state $i$ is defined as the greatest common divisor (GCD) of the set of numbers of steps $n$ for which a return to state $i$ is possible: $d(i) = \gcd \{ n \ge 1 : p_{ii}^{(n)} > 0 \}$

If $d(i) = 1$ , the state is aperiodic. Returns can occur at irregular intervals without a fixed rigid period.
If $d(i) > 1$ , the state is periodic with period $d$ .

For irreducible chains, periodicity is a class property: all states in the same communication class have the same period.

Ergodic States

A state $i$ is positive recurrent if it is recurrent and its expected return time $m_i$ is finite: $m_i = \sum_{n=1}^\infty n f_{ii}^{(n)} < \infty$ If a state is positive recurrent and aperiodic, it is classified as ergodic. A Markov chain is defined as ergodic if all its states are ergodic. Ergodicity is the bedrock property guaranteeing that a system will eventually “forget” its initial state and settle into a stable proportional equilibrium.

Stationary distributions

When an ergodic Markov chain runs for a sufficiently long time, its distribution approaches a steady state, completely independent of the starting state. This limiting distribution is called the stationary distribution, denoted by a row vector $\pi$ .

A probability distribution $\pi$ is a stationary distribution if:

$\pi_j \ge 0$ for all $j \in S$ .
$\sum_{j \in S} \pi_j = 1$ .
$\pi P = \pi$ .

The condition $\pi = \pi P$ indicates that if you start the chain randomly by picking the initial state according to the distribution $\pi$ , the state distribution at any subsequent step remains exactly $\pi$ .

For an irreducible, aperiodic, and positive recurrent (i.e., ergodic) Markov chain, a unique stationary distribution $\pi$ exists, and the fundamental limit theorem applies:

$\lim_{n \to \infty} p_{ij}^{(n)} = \pi_j \quad \text{for all } i, j \in S$

Furthermore, the stationary probability is inversely proportional to the expected return time: $\pi_j = 1/m_j$ . This provides a profound link between the limits of transition probabilities and the stochastic temporal behavior of the chain.

The Gambler's Ruin

A gambler plays a fair game where they win $1 with probability $0.5$ and lose $1 with probability $0.5$ at each step. The gambler starts with $\$a$ and the game ends when their capital reaches $0$ (ruin) or a predetermined target value $\$N$ (success). This process can be seamlessly modeled as a discrete-time Markov chain with state space $S = \{0, 1, 2, \dots, N\}$ where states $0$ and $N$ represent the termination of the game.

We are analyzing classification of states. Are the transient states guaranteed to be left forever, and what is the nature of states 0 and N within the context of state classifications?

Deep Dive into Continuous-Time Markov Chains (CTMC)

While discrete-time Markov chains rigidly describe systems transitioning at fixed, discrete time steps, vastly many real-world stochastic processes change state at random, continuously distributed times along the $t \in [0, \infty)$ axis. Such processes are modeled as Continuous-Time Markov Chains (CTMC).

A stochastic process $\{X(t) : t \ge 0\}$ defined on a discrete state space $S$ is a CTMC if it satisfies the strict continuous-time Markov property: $\mathbb{P}(X(t+s) = j \mid X(s) = i, X(u) \text{ for } 0 \le u < s) = \mathbb{P}(X(t+s) = j \mid X(s) = i)$

For a time-homogeneous CTMC, the transition probability only depends on the length of the time interval $t$ : $p_{ij}(t) = \mathbb{P}(X(s+t) = j \mid X(s) = i)$

Holding Times and Transition Rates

When a CTMC enters a state $i$ , the amount of time it spends in that state before making a sudden transition—called the holding time or sojourn time—strictly follows an exponential distribution with a rate parameter $q_i$ (often denoted $v_i$ or $\lambda_i$ ).

Why an exponential distribution? The exponential distribution is the only strictly continuous probability distribution possessing the memoryless property. The Markov assumption fundamentally requires that the time already spent in a state yields zero new information about the remaining time to be spent in that state.

When the process inevitably leaves state $i$ , the probability it transitions specifically to state $j$ is independent of the holding time and is denoted by the transition probability $p_{ij}$ , where $\sum_{j \neq i} p_{ij} = 1$ and $p_{ii} = 0$ .

Equivalently, one specifies the unnormalized transition rates $q_{ij}$ , defined precisely as the rate at which the continuous process transitions from state $i$ to state $j$ : $q_{ij} = q_i p_{ij} \quad \text{for } i \neq j$

These transition rates are compactly arranged in the generator matrix (or infinitesimal generator) $Q$ , whose scalar elements are given by:

$Q_{ij} = q_{ij}$ for $i \neq j$
$Q_{ii} = -q_i = -\sum_{j \neq i} q_{ij}$

Because of this specific continuous balancing formulation, the row sums of the generator matrix $Q$ are identically $0$ across all rows: $\sum_{j \in S} Q_{ij} = 0$

The Kolmogorov Forward and Backward Equations

In discrete time, matrices multiply simply via algebraic powers $P^{(n)} = P^n$ . In continuous time, the transition matrices $P(t) = \{p_{ij}(t)\}$ satisfy systems of coupled linear differential equations instead of algebraic relations, linking the finite time transition probabilities to the instantaneous transition rates mathematically encoded in the matrix $Q$ .

Kolmogorov Backward Equations: $\frac{d}{dt} P(t) = Q P(t)$ Component-wise, this elegantly expands to $\frac{d}{dt} p_{ij}(t) = \sum_k q_{ik} p_{kj}(t)$ . These differential equations calculate probabilities by conditioning on the first transition out of the initial starting state.

Kolmogorov Forward Equations: $\frac{d}{dt} P(t) = P(t) Q$ Component-wise, this equates to $\frac{d}{dt} p_{ij}(t) = \sum_k p_{ik}(t) q_{kj}$ . The forward equations construct the probability distribution by conditioning on the final transition immediately preceding time $t$ .

Provided sufficient regularity conditions (which automatically hold firm in all finite state spaces), the solution to these initial value problems (with boundary condition $P(0) = I$ , the identity matrix) is given identically by the matrix exponential function: $P(t) = e^{Qt} = \sum_{n=0}^\infty \frac{(Qt)^n}{n!}$

Stationary Distributions in CTMCs

Much like in DTMCs, under the correct irreducibility and positive-recurrence topological assumptions, a continuous-time Markov chain invariably possesses a stationary distribution $\pi$ governing the exact long-term steady-state proportion of time the process spends occupying each state.

However, the geometric algebraic condition $\pi = \pi P$ is dynamically replaced by a differential equilibrium corresponding to a zero net rate of probability flux: $\pi Q = 0$

Here, $\pi$ remains a normalized probability vector with $\sum \pi_i = 1$ . The matrix equation $\pi Q = 0$ corresponds exactly to a set of global balance equations stating firmly that the total probability flux leaving state $j$ strictly equals the total probability flux entering state $j$ from all other states combined.

$\pi_j q_j = \sum_{i \neq j} \pi_i q_{ij}$

This flux balance principle is absolutely foundational to modern queuing theory, stochastic chemical reaction networks, and biological population models, permanently bridging the highly abstract formulations of analytical probability into powerful mathematical tools used for rigorously evaluating complex dynamic system metrics over infinite continuous-time horizons.

Markov Chains

Markov Chains

Discrete-Time Markov Chains (DTMC)

Transition Matrices

nn-Step Transition Probabilities

If the transition matrix $P$ of a 3-state Markov chain has row sums of 1, what must be true about the row sums of $P^2$?

Chapman-Kolmogorov Equations

Classification of States

Accessibility and Communication

Recurrent and Transient States

Periodicity

Ergodic States

Stationary distributions

We are analyzing classification of states. Are the transient states guaranteed to be left forever, and what is the nature of states 0 and N within the context of state classifications?

Deep Dive into Continuous-Time Markov Chains (CTMC)

Holding Times and Transition Rates

The Kolmogorov Forward and Backward Equations

Stationary Distributions in CTMCs

$n$ -Step Transition Probabilities