Analysis of Variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among group means in a sample. ANOVA was developed by statistician and evolutionary biologist Ronald Fisher. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the -test beyond two means.
While the -test is limited to comparing two groups, applying multiple -tests across several groups exponentially increases the Type I error rate (false positives). ANOVA controls this error rate by evaluating the entire set of groups simultaneously, partitioning the observed variance in a particular variable into components attributable to different sources of variation.
The Logic of Variance Partitioning
The fundamental mechanism of ANOVA is the partitioning of total variance into two primary components:
- Between-Group Variance: The variance of the group means around the grand mean. This reflects the effect of the independent variable(s) plus error.
- Within-Group Variance: The variance of individual scores around their respective group means. This reflects pure error (unexplained variance).
If the between-group variance is significantly larger than the within-group variance, it indicates that the independent variable has a significant effect on the dependent variable.
Assumptions of ANOVA
The validity of ANOVA relies on three core assumptions:
- Independence of Observations: The residuals must be mutually independent. This is fundamentally a design issue handled through random sampling and random assignment.
- Normality: The residuals of the model are normally distributed. While ANOVA is robust to moderate violations of normality (especially with large, equal sample sizes due to the Central Limit Theorem), severe skewness or outliers can compromise the -test.
- Homogeneity of Variances (Homoscedasticity): The variances of the populations from which the samples are drawn are equal. This is tested using Levene’s Test or Bartlett’s Test. Welch’s ANOVA can be used if this assumption is heavily violated.
One-Way ANOVA
A One-Way ANOVA involves a single independent variable (factor) with three or more categorical levels. The model for an observation (the -th observation in the -th group) is given by:
Where:
- is the grand mean.
- is the treatment effect for the -th group (where ).
- is the random error associated with the -th observation in the -th group, assumed to be .
Hypotheses
The null hypothesis () states that all group population means are equal (or equivalently, all treatment effects are zero):
The alternative hypothesis () states that at least one population mean is different:
Sums of Squares
The Total Sum of Squares () is partitioned into the Sum of Squares Between () and the Sum of Squares Within (, also known as Error Sum of Squares, ).
Total Sum of Squares (SST) measures the total variation in the data: where is the grand mean.
Sum of Squares Between (SSB) measures the variation of group means around the grand mean: where is the mean of the -th group and is the number of observations in the -th group.
Sum of Squares Within (SSW) measures the variation of individual observations around their respective group means:
Degrees of Freedom and Mean Squares
Degrees of freedom () are required to convert sums of squares into variances (mean squares). Let be the total sample size and be the number of groups.
The Mean Squares () are calculated by dividing the Sum of Squares by their respective degrees of freedom:
The F-Statistic
The test statistic for ANOVA is the ratio of the Mean Square Between to the Mean Square Within. Under the null hypothesis, both and are independent estimates of the population variance , so their ratio follows an -distribution with and degrees of freedom.
If the -statistic is significantly larger than 1 (specifically, greater than the critical value from the -distribution for a given alpha level), the null hypothesis is rejected.
In a One-Way ANOVA with 4 groups and 40 total participants, what are the degrees of freedom for the F-statistic (numerator and denominator)?
A university aims to determine if three different teaching methods (Standard Lecture, Flipped Classroom, Problem-Based Learning) result in different final exam scores. 90 students are randomly assigned to the three methods (30 per method). The resulting Sum of Squares Between (SSB) is calculated as 450, and the Sum of Squares Within (SSW) is 2610.
Calculate the Mean Square Between (MSB) and Mean Square Within (MSW).
Two-Way ANOVA
A Two-Way ANOVA analyzes the effect of two independent categorical variables (factors) on a continuous dependent variable. It fundamentally differs from running two independent One-Way ANOVAs because it evaluates the interaction effect between the two variables.
The statistical model for a Two-Way ANOVA with factors and , fixed effects, and with replication ( observations per cell) is:
Where:
- is the -th observation in the -th level of factor and -th level of factor .
- is the overall population grand mean.
- is the main effect of factor A at level .
- is the main effect of factor B at level .
- is the interaction effect between level of A and level of B.
- is the random error term, .
Interaction Effects
An interaction effect occurs when the effect of one independent variable on the dependent variable changes depending on the level of the other independent variable. Graphically, this is observed when the lines representing the means across levels of factors are not parallel (they may cross or diverge).
If the interaction effect is significant, interpreting the main effects (the individual effects of factor and factor ) becomes highly nuanced, as the main effects no longer fully describe the relationship.
Sums of Squares for Two-Way ANOVA
In a balanced design (equal sample sizes in all cells), the total variance is partitioned into four orthogonal components:
Where:
- SSA: Sum of Squares for Factor A
- SSB: Sum of Squares for Factor B
- SSAB: Sum of Squares for the Interaction
- SSE: Sum of Squares for Error (Within)
Degrees of freedom are similarly partitioned: Let be the number of levels of Factor A, be the number of levels of Factor B, and the number of replicates per cell. Total observations .
Three distinct -tests are performed by dividing the corresponding Mean Square () by the Mean Square Error ():
In a Two-Way ANOVA, you are studying the effects of Diet (3 levels) and Exercise (2 levels) on weight loss. You have 10 participants per cell (6 cells total). What are the degrees of freedom for the interaction effect (Diet × Exercise)?
Post-Hoc Tests
A significant ANOVA only tells you that at least two means differ, not which means differ. To identify specific pairwise differences, post-hoc tests are required. Conducting multiple standard -tests inflates the family-wise error rate (the probability of making at least one Type I error across all tests).
where is the number of comparisons. For 5 groups, there are comparisons. If per test, the family-wise error rate jumps to (assuming independence, which is an oversimplification but illustrates the inflation).
Common Post-Hoc Adjustments
- Tukey’s Honestly Significant Difference (HSD): Compares all possible pairs of means. It is based on the studentized range distribution () and provides tight control over the family-wise error rate when sample sizes are equal.
- Bonferroni Correction: The most conservative method. It simply divides the desired family-wise alpha level by the number of comparisons: . While it strictly prevents Type I errors, it severely impacts statistical power (increasing Type II errors).
- Scheffé’s Method: Used for all possible linear contrasts, not just pairwise comparisons. It is the most conservative post-hoc test when performing purely pairwise comparisons, but is highly flexible.
Which of the following correction methods is considered the most conservative and provides the lowest statistical power for detecting genuine differences?
Effect Size
The -value from an -test indicates statistical significance but not practical significance. Effect size metrics quantify the magnitude of the differences between groups.
Eta-Squared ()
Eta-squared represents the proportion of total variance in the dependent variable that is associated with membership in the different groups defined by the independent variable.
While intuitive, is an upwardly biased estimator of the population effect size (it tends to overestimate).
Partial Eta-Squared ()
In multi-factor designs (like Two-Way ANOVA), can be misleading because the effects of one factor reduce the variance available to be explained by another. Partial eta-squared isolates the variance explained by a specific factor relative to the unexplained variance (error) and the variance of that specific factor.
Omega-Squared ()
Omega-squared is a more complex but unbiased estimator of the population variance explained. It corrects for the bias present in by incorporating degrees of freedom and Mean Square terms.
A researcher conducts a Two-Way ANOVA assessing the impact of Drug Dosage (A) and Therapy (B) on symptom reduction. The output yields the following sums of squares: SSA = 400, SSB = 100, SSAB = 50, SSE = 450. Total SST = 1000.
Calculate the eta-squared (η²) for Drug Dosage (A).
Repeated Measures ANOVA
Repeated Measures ANOVA is the equivalent of the one-way ANOVA, but for related, not independent groups. It is the extension of the dependent (paired) -test. Examples include measuring the same participants across multiple time points (e.g., Blood pressure at baseline, week 1, and week 2) or exposing the same participants to all conditions in an experiment.
The key advantage of Repeated Measures ANOVA is that it removes variance attributable to individual differences from the Error Sum of Squares. This typically makes the analysis much more powerful (higher probability of detecting a true effect) than a standard independent-samples ANOVA.
The Assumption of Sphericity
Repeated measures designs require the assumption of Sphericity. Sphericity requires that the variances of the differences between all pairs of related groups are equal. It is evaluated using Mauchly’s Test of Sphericity.
If the assumption of sphericity is violated (Mauchly’s Test ), the Type I error rate inflates. To correct this, the degrees of freedom are adjusted downwards. Common corrections include:
- Greenhouse-Geisser Correction: The most conservative correction. Used when sphericity is severely violated (epsilon ).
- Huynh-Feldt Correction: Less conservative, used when sphericity violation is mild (epsilon ).
If is close to 1, the sphericity assumption holds perfectly. The corrections effectively increase the critical -value required for significance by artificially reducing the degrees of freedom.