Vectors, matrices, and the geometry of linear spaces.
May 2026
In elementary physics, a vector is often described as a directed line segment. While intuitive, this definition is insufficient for higher mathematics. Modern linear algebra treats a Vector Space as an abstract algebraic structure—a “playground” where elements can be added together and scaled by numbers.
A Vector Space over a field (typically ) is a set equipped with two operations: vector addition () and scalar multiplication (). Instead of memorizing axioms as dry rules, we can view them as the “laws of physics” for our data.
The most common example is , where addition and scaling are performed component-wise. Let’s verify the Commutativity () and Distributivity () properties using NumPy.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
For any set to be a formal Vector Space, these eight rules must hold for all and scalars :
Consider the set of all points in the first quadrant: . Why does this fail to be a vector space?
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
The failure shown above violates Closure under Scalar Multiplication. If we scale a “positive” vector by a negative number, we leave the set. Thus, the first quadrant is not a vector space.
The beauty of these axioms is that “vectors” don’t have to be arrows. They can be functions or polynomials. The set of polynomials of degree forms a vector space because adding two polynomials yields another polynomial, and the axioms hold.
In advanced applications like Fourier analysis, we treat signals (functions) as vectors. If and are continuous functions, then is also continuous.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
Once we have a vector space, we need a way to describe its contents efficiently. If a vector space is a “playground,” then Linear Independence and Span are the rules for how many “tools” (vectors) you actually need to build everything in that playground.
The Span of a set of vectors is the set of all possible linear combinations of those vectors.
Intuition: If you have two non-parallel arrows in a 2D plane, their span is the entire plane because you can reach any point by scaling and adding those two arrows.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
A set of vectors is linearly independent if none of the vectors can be written as a linear combination of the others. In other words, they are all “original” and provide “new directions.”
Vectors are linearly independent if the equation has only the trivial solution .
If a vector can be built from others, the set is Dependent. We can test this by checking the Rank of the matrix formed by these vectors. If , they are dependent.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
Imagine three vectors in 3D space. If one is a combination of the others, they all lie on a single plane (2D), even though there are three vectors.
Linear independence tells us if our data is redundant. If you have 100 sensors measuring the same physical phenomenon (e.g., temperature) and they are perfectly correlated, your “feature matrix” will be rank-deficient. You have 100 numbers, but effectively only 1 “dimension” of information.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
The concepts of basis and dimension provide a way to “measure” the size and complexity of a vector space.
A set of vectors is linearly independent if the only solution to is for all . The span of a set is the set of all possible linear combinations.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
A basis for a vector space is a set of vectors that:
The dimension of a vector space , denoted , is the number of vectors in any basis for .
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
Matrices are the numerical engines of Linear Algebra. While a “Vector Space” is a theoretical playground, a Matrix is the specific blueprint that tells us how to manipulate that space.
A matrix of size is a grid of numbers that represents a mapping from to . Each column of the matrix tells us where one of the basis vectors of “lands” in .
Let’s visualize a Shear Transformation matrix: . This matrix leaves the -axis alone but shifts the -direction.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
A system of linear equations asks: “Which vector lands on when we apply the transformation ?”
If the matrix squashes space into a lower dimension (i.e., it is Rank-Deficient or Singular), then might be unreachable, or there might be infinitely many paths to reach it.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
Multiplying two matrices represents applying transformation first, then . Because the order of geometric transformations (like rotating then shifting) matters, matrix multiplication is not commutative ().
RREF is the “simplest” version of a matrix that still represents the same linear system. It allows us to read off the solutions directly. In RREF, the leading entry of each row is 1, and all other entries in that column are 0.
Most vectors change direction when a linear transformation is applied. However, some special vectors keep their direction and are only stretched or shrunk. These are eigenvectors, and their scaling factor is the eigenvalue.
For a linear operator , a vector is an eigenvector if: where is a scalar (the eigenvalue).
Intuition: In a 2D rotation, no real vector keeps its direction (except the zero vector). But in a scaling transformation, the axes are eigenvectors because points on them move only along the line.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
To find , we must solve . This gives us the Characteristic Polynomial.
In high-dimensional spaces (like Google’s PageRank), we don’t calculate determinants. Instead, we use the Power Method: repeatedly apply to a random vector until it converges to the dominant eigenvector.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
If a matrix has enough eigenvectors, we can rotate our coordinate system so the transformation is just axis-aligned scaling. This is the foundation of Principal Component Analysis (PCA).
A linear transformation is a mapping between vector spaces that preserves the operations of addition and scalar multiplication.
is linear if for all and :
Every linear transformation between finite-dimensional vector spaces can be represented as a matrix.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
If and are linear maps, their composition is also linear. The matrix representing is the product of the matrices representing and .
The matrix representation of a linear map depends on the choice of bases for and . If is the matrix in basis , and is the transition matrix from basis to , then:
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
Up to this point, our vector spaces have been “bare.” We can add vectors and scale them, but we have no notion of how long a vector is, or what the angle between two vectors might be. In bare linear algebra, there is no “perpendicular.”
Inner product spaces equip vector spaces with this geometric structure. This allows us to define limits, identify the “closest” approximation to a signal, and decompose data into independent components.
An inner product is a function that takes two vectors and returns a scalar. For , the standard inner product is the dot product . To generalize this to any vector space over or , we define an inner product as a map satisfying:
The norm (length) of a vector is defined as . This is the generalized Pythagorean theorem. The distance between two points is then simply .
One of the most powerful results in mathematics is the Cauchy-Schwarz inequality: This ensures that the “angle” defined by is always between -1 and 1 for real spaces.
Two vectors are orthogonal if . In terms of data, orthogonal vectors are completely “unrelated” or “uncorrelated” in the geometry defined by that inner product.
A set of vectors is orthonormal if they are all orthogonal to each other and all have length 1. Working with an orthonormal basis is trivial compared to a general basis because the coefficients of any vector are just the inner products:
The Gram-Schmidt process is a recipe for turning any basis into an orthonormal one. It works by taking the first vector, then taking the second and subtracting the part that “leaks” into the first, and so on.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
The most significant application of inner products is the Orthogonal Projection. Given a subspace and a vector not in , the “closest” vector in to is the projection .
This is the engine behind Least Squares Regression. If we have a system that has no solution, we look for the that minimizes the error . This happens when is the projection of onto the column space of .
Calculus students often see Fourier series as a specialized topic. In reality, it is just linear algebra on an infinite-dimensional inner product space of functions . The functions and form an orthogonal basis. Calculating Fourier coefficients is exactly the same as calculating coordinates in using dot products!
A vector space can contain smaller vector spaces called subspaces. We can also “divide” a space by a subspace to create a quotient space.
A subset of a vector space is a subspace if:
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
Given a subspace , the quotient space is the set of all cosets . Intuitively, the quotient space “ignores” all differences that lie within .
The dimension of a quotient space is:
For a linear map :
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
The Singular Value Decomposition (SVD) is arguably the most important result in applied linear algebra. While eigendecomposition works only for special square matrices, SVD works for any matrix—tall, wide, square, singular, or non-singular. It provides a way to “see” the underlying structure of data by decomposing it into its most significant components.
Every matrix represents a linear map. SVD says that any such map can be broken down into three simple steps:
Mathematically:
The real magic of SVD is the Eckart-Young Theorem. It states that if you want the best possible “summary” of a matrix using only dimensions (where is less than the rank of ), the answer is to keep only the largest singular values and their corresponding vectors.
This is how image compression and noise reduction work. By throwing away small singular values, we lose “noise” or “unimportant detail” but keep the overall structure.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
When a matrix is not invertible (e.g., it is not square or is singular), we can still “solve” using the Moore-Penrose Pseudoinverse .
Using SVD, the pseudoinverse is trivial to compute: where is formed by transposing and replacing every non-zero with .
The solution is the “best” solution in two senses:
Sometimes, a matrix is “ugly”—it is filled with dense numbers that obscure the underlying physics or logic of the system. Canonical forms are the “simplest” possible representations of a linear operator. By changing our basis, we can reveal the true nature of the transformation.
If you are studying a physical system, like a vibrating string or a chemical reaction, the equations are often coupled (everything depends on everything else). A canonical form decouples the system.
The most famous canonical form is the Diagonal Form. If a matrix is diagonalizable, it means there exists a basis where the operator simply scales each axis independently.
What happens if a matrix is not diagonalizable? This occurs when there are not enough eigenvectors (the matrix is “defective”).
The Jordan Normal Form (JNF) is the best we can do for any square matrix. It decomposes the operator into Jordan Blocks on the diagonal:
Everything off the block is zero. Inside the block, we have the eigenvalue on the diagonal and 1s just above it. These 1s represent “coupling” that cannot be removed.
In control theory, we look at the JNF to determine if a system will explode or settle.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
The Jordan form requires complex numbers (to find all roots of the characteristic polynomial). If we want to stay within the field of rational numbers or real numbers , we use the Rational Canonical Form (also known as the Frobenius map).
Instead of eigenvalues, this form uses Invariant Factors—polynomials that divide each other. This is deeply connected to the structure theory of modules over a Principal Ideal Domain (PID).
A scalar is a “rank-0” tensor (a single number). A vector is a “rank-1” tensor (an array). A matrix is a “rank-2” tensor (a grid). Beyond these lie higher-rank tensors—multi-dimensional arrays that follow specific transformation rules. Tensors are the natural language of General Relativity, Quantum Mechanics, and modern Deep Learning.
While computer scientists often define a tensor as “a multidimensional array,” mathematicians define it by how it transforms or what it does.
A tensor is a multilinear map. Just as a matrix represents a linear map , a tensor takes multiple vectors as input and produces a scalar (or another vector). For example, a rank-2 tensor is a function that is linear in AND linear in .
In physics, we distinguish between:
A general tensor can have both types of indices. This distinction is crucial for ensuring that physical laws remain the same regardless of the units or axes we choose.
The tensor product is a way to combine two vector spaces and into a larger space . If and , then is an element of the product space. If and , the tensor product has dimension . You can think of this as the space of all possible matrices.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
The determinant is the most famous example of a multilinear form. It is an alternating -tensor. “Alternating” means that if you swap two input vectors, the sign of the output flips. This property is what allows the determinant to measure signed volume.
In tensor calculus, we often drop the summation sign . If an index appears twice (once up, once down), it is summed over. Instead of: We write: This compact notation is the standard in engineering and theoretical physics.
Group theory studies symmetry in the abstract. Linear algebra studies matrices acting on vectors. Representation Theory is the bridge between them: it studies how abstract groups can be represented as matrices. This allows us to use the powerful tools of linear algebra (trace, determinant, eigenvalues) to solve problems in abstract algebra and physics.
A collection of symmetries (a Group ) can often be represented by a set of linear transformations on a vector space . A representation is a homomorphism . This means that for every group element , there is an invertible matrix such that the group composition is preserved:
Just as an integer can be broken down into prime factors, a representation can often be broken down into smaller, simpler representations. If a representation cannot be broken down further, it is called irreducible.
The Maschke Theorem states that for finite groups (over fields like ), every representation is a direct sum of irreducible ones. This is effectively the “fundamental theorem of arithmetic” for representations.
Working with full matrices for every group element is computationally expensive. Character Theory simplifies this by focusing only on the trace of the matrices. The character of a representation is the function defined by:
Characters are “class functions”—they are the same for elements in the same conjugacy class. This remarkably compact representation contains almost all the information about the representation.
Read the code, make a small change, then run it and inspect the output. Runtime setup messages stay outside the terminal so the result remains focused on what the program prints.
In quantum mechanics, particles are described by wavefunctions. If a physical system has a certain symmetry (like rotating a crystal), the wavefunction must transform according to a representation of that symmetry group.