Singular Value Decomposition (SVD)

The Singular Value Decomposition (SVD) is arguably the most important result in applied linear algebra. While eigendecomposition works only for special square matrices, SVD works for any matrix—tall, wide, square, singular, or non-singular. It provides a way to “see” the underlying structure of data by decomposing it into its most significant components.

The Geometric Idea

Every matrix $A$ represents a linear map. SVD says that any such map can be broken down into three simple steps:

A rotation in the input space ( $V$ ).
A scaling along the principal axes ( $\Sigma$ ).
A rotation in the output space ( $U$ ).

Mathematically: $A = U \Sigma V^T$

$V$ : Columns are “right singular vectors.” They define an orthonormal basis in the input space.
$\Sigma$ : A diagonal matrix of “singular values” $\sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_n \ge 0$ . These tell you the “strength” or “gain” of the matrix in each direction.
$U$ : Columns are “left singular vectors.” They define an orthonormal basis in the output space.

Data Compression: The Best Low-Rank Approximation

The real magic of SVD is the Eckart-Young Theorem. It states that if you want the best possible “summary” of a matrix $A$ using only $k$ dimensions (where $k$ is less than the rank of $A$ ), the answer is to keep only the $k$ largest singular values and their corresponding vectors.

$A_k = \sum_{i=1}^k \sigma_i u_i v_i^T$

This is how image compression and noise reduction work. By throwing away small singular values, we lose “noise” or “unimportant detail” but keep the overall structure.

python

Interactive Lab

Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.

Step 1

Inspect the idea

Step 2

Edit the program

Step 3

Run and compare

The Pseudoinverse: Solving the Solvable and Unsolvable

When a matrix $A$ is not invertible (e.g., it is not square or is singular), we can still “solve” $Ax = b$ using the Moore-Penrose Pseudoinverse $A^\dagger$ .

Using SVD, the pseudoinverse is trivial to compute: $A^\dagger = V \Sigma^\dagger U^T$ where $\Sigma^\dagger$ is formed by transposing $\Sigma$ and replacing every non-zero $\sigma_i$ with $1/\sigma_i$ .

The solution $x = A^\dagger b$ is the “best” solution in two senses:

It minimizes the error $\|Ax - b\|^2$ (Least Squares).
If there are many such solutions, it picks the one with the smallest length $\|x\|$ .

Singular Value Decomposition (SVD)

Singular Value Decomposition (SVD)

The Geometric Idea

Data Compression: The Best Low-Rank Approximation

Interactive Lab

The Pseudoinverse: Solving the Solvable and Unsolvable

Exercises

In an SVD decomposition A = UΣVᵀ, what do the values in Σ represent?

If a matrix has singular values [100, 50, 0.01, 0.0001], which singular values should we keep for a good low-rank approximation?

What is the relationship between singular values and the eigenvalues of AᵀA?