Search Knowledge

© 2026 LIBREUNI PROJECT

Non-Parametric Statistics

Non-Parametric Statistics

Statistical inference often relies on parametric assumptions, specifically that the population from which the sample is drawn follows a known probability distribution, typically the normal distribution, characterized by a set of parameters (e.g., mean μ\mu and variance σ2\sigma^2). Non-parametric statistics, in contrast, provide procedures for inferring properties of populations that do not rely on restrictive assumptions regarding the underlying parameterized probability distributions.

These methods are essential when sample sizes are small, data are ordinal or nominal, or severe departures from normality are evident. While non-parametric tests are more robust to distributional violations, they generally possess less statistical power compared to their parametric counterparts when the parametric assumptions are actually met.

The Sign Test

The sign test is one of the simplest non-parametric tests, used to assess whether the median of a continuous distribution equals a hypothesized value M0M_0. It is the non-parametric alternative to the one-sample t-test.

Let X1,X2,,XnX_1, X_2, \dots, X_n be a random sample from a continuous distribution with median MM. We wish to test the null hypothesis H0:M=M0H_0: M = M_0.

The test statistic SS is defined as the number of sample observations strictly greater than M0M_0. Under H0H_0, each observation has a 0.5 probability of being greater than M0M_0, assuming continuity. Thus, SS follows a binomial distribution: SBinomial(N,p=0.5)S \sim \text{Binomial}(N, p = 0.5) where NN is the effective sample size, discarding any ties where Xi=M0X_i = M_0.

For large NN (typically N>20N > 20), a normal approximation can be used: Z=SN2N4N(0,1)Z = \frac{S - \frac{N}{2}}{\sqrt{\frac{N}{4}}} \sim \mathcal{N}(0, 1) A continuity correction of 0.50.5 is often applied to SS for greater accuracy.

Why might the sign test discard observations equal to the hypothesized median $M_0$?

Wilcoxon Signed-Rank Test

The sign test ignores the magnitude of the differences between the observations and the hypothesized median. The Wilcoxon signed-rank test incorporates this magnitude, requiring the assumption that the underlying continuous distribution is symmetric about its median. It serves as a more powerful non-parametric alternative to the paired Student’s t-test or the one-sample t-test.

Given pairs of observations (Xi,Yi)(X_i, Y_i) for i=1,,ni = 1, \dots, n, compute the differences Di=XiYiD_i = X_i - Y_i.

  1. Discard pairs where Di=0D_i = 0. Let NN be the reduced sample size.
  2. Rank the absolute differences Di|D_i| from smallest to largest. Ties are assigned the average of the ranks they would have received. Let RiR_i be the rank of Di|D_i|.
  3. Calculate the test statistic WW, which is the sum of the signed ranks: W=i=1Nsgn(Di)RiW = \sum_{i=1}^{N} \text{sgn}(D_i) R_i Alternatively, calculate the sum of ranks for positive differences (T+T^+) and negative differences (TT^-). The test statistic is often defined as T=min(T+,T)T = \min(T^+, T^-).

Under H0H_0 (symmetric distribution about 0), the expected value and variance of WW are: E[W]=0\mathbb{E}[W] = 0 Var(W)=N(N+1)(2N+1)6\text{Var}(W) = \frac{N(N+1)(2N+1)}{6} For large NN, WW is approximately normally distributed, permitting the use of a ZZ-test.

Mann-Whitney U Test (Wilcoxon Rank-Sum Test)

When comparing two independent samples to determine if they originate from the same population, the Mann-Whitney U test (or Wilcoxon rank-sum test) offers a non-parametric alternative to the independent two-sample t-test. It assumes the two distributions are identical in shape but potentially shifted in location.

Let X1,,XmX_1, \dots, X_m and Y1,,YnY_1, \dots, Y_n be independent samples.

  1. Combine all m+nm+n observations and rank them from 11 to m+nm+n.
  2. Compute the sum of the ranks for sample 1 (R1R_1) and sample 2 (R2R_2).
  3. The UU statistics are calculated as: U1=R1m(m+1)2U_1 = R_1 - \frac{m(m+1)}{2} U2=R2n(n+1)2U_2 = R_2 - \frac{n(n+1)}{2} Note that U1+U2=mnU_1 + U_2 = mn. The test statistic is U=min(U1,U2)U = \min(U_1, U_2).

Under the null hypothesis that XX and YY have the same distribution, the expectation and variance of UU are: E[U]=mn2\mathbb{E}[U] = \frac{mn}{2} Var(U)=mn(m+n+1)12\text{Var}(U) = \frac{mn(m+n+1)}{12} Ties in the data require an adjustment to the variance formula: Var(U)=mn12((m+n+1)i=1kti3ti(m+n)(m+n1))\text{Var}(U) = \frac{mn}{12} \left( (m+n+1) - \sum_{i=1}^k \frac{t_i^3 - t_i}{(m+n)(m+n-1)} \right) where kk is the number of tied groups and tit_i is the number of observations in the ii-th tied group.

What condition reduces the power of the Mann-Whitney U test relative to an independent two-sample t-test?

Kruskal-Wallis one-way analysis of variance

The Kruskal-Wallis H test extends the Mann-Whitney U test to more than two independent groups. It is the non-parametric equivalent of the one-way ANOVA, testing whether kk independent samples originate from the same distribution.

Given kk groups with sample sizes n1,n2,,nkn_1, n_2, \dots, n_k and total observations N=i=1kniN = \sum_{i=1}^k n_i:

  1. Rank all NN observations jointly from 11 to NN.
  2. Compute the sum of ranks RiR_i for each group ii.
  3. The test statistic HH is: H=12N(N+1)i=1kRi2ni3(N+1)H = \frac{12}{N(N+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(N+1)

If the null hypothesis is true (all samples come from the same population) and the sample sizes are sufficiently large (typically ni5n_i \geq 5), HH is approximately distributed as a chi-square distribution with k1k-1 degrees of freedom: Hχk12H \sim \chi^2_{k-1} If the null hypothesis is rejected, post-hoc procedures like Dunn’s test are utilized for pairwise comparisons to isolate the specific stochastic dominance among groups.

Spearman’s Rank Correlation Coefficient

Evaluating the strength and direction of association between two continuous or ordinal variables without assuming linearity relies on Spearman’s rank correlation coefficient (ρ\rho or rsr_s). It evaluates the monotonic relationship between two variables, contrasting with Pearson’s correlation which evaluates linear relationships.

For nn pairs of observations (Xi,Yi)(X_i, Y_i), convert the raw scores to ranks R(Xi)R(X_i) and R(Yi)R(Y_i). Spearman’s ρ\rho is computed analogously to Pearson’s correlation coefficient, but applied to the ranks: ρ=16i=1ndi2n(n21)\rho = 1 - \frac{6 \sum_{i=1}^n d_i^2}{n(n^2 - 1)} where di=R(Xi)R(Yi)d_i = R(X_i) - R(Y_i) is the difference between the ranks of corresponding variables.

If there are identical values (ties), the simplified formula utilizing di2d_i^2 becomes inaccurate, and the standard Pearson correlation formula must be applied directly to the ranked variables.

ρ=i(R(Xi)Rˉ(X))(R(Yi)Rˉ(Y))i(R(Xi)Rˉ(X))2i(R(Yi)Rˉ(Y))2\rho = \frac{\sum_i (R(X_i) - \bar{R}(X))(R(Y_i) - \bar{R}(Y))}{\sqrt{\sum_i (R(X_i) - \bar{R}(X))^2 \sum_i (R(Y_i) - \bar{R}(Y))^2}}

Values of ρ\rho vary from 1-1 to +1+1, indicating perfect negative or positive monotonic associations, respectively.

Bootstrap and Resampling Methods

Modern computational power enables simulation-based non-parametric approaches, most notably bootstrapping. Introduced by Bradley Efron, bootstrapping relies on random sampling with replacement from the original dataset.

If we possess a sample X={x1,,xn}X = \{x_1, \dots, x_n\} drawn from an unknown distribution FF, we construct an empirical distribution function F^\hat{F}. By drawing repeated samples of size nn, with replacement, from XX, we generate BB bootstrap samples X1,X2,,XBX^{*1}, X^{*2}, \dots, X^{*B}.

For a sample statistic θ^=s(X)\hat{\theta} = s(X) estimating a parameter θ\theta, we compute the statistic for each bootstrap sample: θ^b=s(Xb)\hat{\theta}^{*b} = s(X^{*b}). The distribution of θ^b\hat{\theta}^{*b} approximates the sampling distribution of θ^\hat{\theta}, enabling the construction of confidence intervals and hypothesis testing lacking parametric form.

The bootstrap standard error is the standard deviation of the bootstrap replicates: SE^(θ^)=1B1b=1B(θ^bθ^ˉ)2\widehat{\text{SE}}(\hat{\theta}) = \sqrt{\frac{1}{B-1} \sum_{b=1}^B \left( \hat{\theta}^{*b} - \bar{\hat{\theta}}^* \right)^2 } where θ^ˉ\bar{\hat{\theta}}^* is the mean of the bootstrap estimates. Resampling procedures eliminate reliance on asymptotic normality assumptions, providing robust inferences particularly suitable for complex estimators or small sample sizes limit conventional asymptotic theory.

Kernel Density Estimation

Kernel Density Estimation (KDE) establishes a non-parametric perspective on estimating the probability density function of a continuous random variable. Parametric estimation fits a predetermined shape (e.g., normal, gamma) parameterized by equations. KDE estimates the density entirely from data.

Let (x1,x2,,xn)(x_1, x_2, \dots, x_n) be independent and identically distributed samples drawn from some distribution with an unknown density ff. The kernel density estimator is: f^h(x)=1nhi=1nK(xxih)\hat{f}_h(x) = \frac{1}{nh} \sum_{i=1}^n K\left(\frac{x - x_i}{h}\right) where KK constitutes the kernel (a non-negative function integrating to one) and h>0h > 0 denotes a smoothing parameter known as the bandwidth. The bandwidth heavily influences the estimator. Small hh induces undersmoothing, yielding high variance (spurious fluctuations), whereas large hh evokes oversmoothing, yielding high bias (obscuring structural features of the distribution). Standard choices for KK include the Gaussian, Epanechnikov, and uniform kernels.

KDE vs. Histogram

Histograms and KDEs both attempt to model data density non-parametrically. Consider a dataset of highly clustered continuous physical measurements. A histogram forces boundaries at arbitrary bin edges. A KDE smooths out data without fixed bins.

Why does standard continuous kernel density estimation often superior to a histogram for continuous distributions?

Previous Module Time Series Analysis
Finish Course