EZ Statistics

ANCOVA (Analysis of Covariance)

Calculator

2. Select Columns & Options

Learn More

Analysis of Covariance (ANCOVA)

Definition

ANCOVA combines ANOVA with regression to analyze differences between group means while controlling for one or more continuous covariates. It is used to compare means of dependent variables across groups, while controlling for the effect of one or more covariates (continuous variables that are not the main focus of the study but could influence the dependent variable). By adjusting for these covariates, ANCOVA provides a more accurate analysis of the group differences by reducing error variance.

Formulas

Sum of Squares Decomposition:

SSTotal=SSTreatment+SSCovariate+SSErrorSS_{Total} = SS_{Treatment} + SS_{Covariate} + SS_{Error}

SSTotal=i=1n(YiYˉ)2SS_{Total} = \sum_{i=1}^{n} (Y_i - \bar{Y})^2where Yˉ\bar{Y} is the grand mean

SSCovariate=[(XiXˉ)(YiYˉ)]2(XiXˉ)2SS_{Covariate} = \frac{[\sum(X_i - \bar{X})(Y_i - \bar{Y})]^2}{\sum(X_i - \bar{X})^2}where XiX_i is the covariate value and Xˉ\bar{X} is the covariate mean

SSTreatment(adj)=ni(YˉiYˉ)2SS_{Treatment(adj)} = \sum n_i(\bar{Y}'_i - \bar{Y}')^2where Yˉi\bar{Y}'_i is the adjusted group mean

SSError(adj)=SSTotalSSTreatment(adj)SSCovariateSS_{Error(adj)} = SS_{Total} - SS_{Treatment(adj)} - SS_{Covariate}

Where:

  • SSTreatment(adj)SS_{Treatment(adj)} = Adjusted Sum of Squares for Treatment, df=k1df = k - 1 where k is number of groups
  • SSCovariateSS_{Covariate} = Sum of Squares for Covariate, df=1df = 1
  • SSError(adj)SS_{Error(adj)} = Adjusted Error Sum of Squares, df=Nk1df = N - k - 1

Adjusted Group Means:

Yˉi=Yˉib(XˉiXˉ)\bar{Y}'_i = \bar{Y}_i - b(\bar{X}_i - \bar{X})

where bb is the pooled within-group regression coefficient

F-Statistics:

FTreatment=MSTreatment(adj)MSError(adj)F_{Treatment} = \frac{MS_{Treatment(adj)}}{MS_{Error(adj)}}FCovariate=MSCovariateMSError(adj)F_{Covariate} = \frac{MS_{Covariate}}{MS_{Error(adj)}}

where MS=SS/dfMS = SS/df for each source of variation

Key Assumptions

Independence: Observations are independent
Normality: Residuals are normally distributed
Homogeneity of Regression Slopes: Relationship between covariate and dependent variable is similar across all treatment groups (parallel slopes)
Homogeneity of Variances: Equal variances across groups
Linearity: Linear relationship between covariate and dependent variable

Practical Example

Step 1: Data Summary
MethodNMean Pre-TestMean Post-TestAdj. Mean
A1073.378.577.72
B1075.184.982.31
C1069.273.877.15

For the raw data, please refer to the R code in the Code Examples section.

Step 2: ANCOVA Results
SourceSSdfMSFp-value
Teaching Method146.66273.33122.78< 0.001
Pre-Test (Covariate)1259.4711259.472108.75< 0.001
Error15.53260.597
Step 3: Key Findings
  • Strong covariate effect (Pre-Test): F(1,26)=156.94F(1,26) = 156.94, p<0.001p < 0.001
  • Significant teaching method effect: F(2,26)=32.91F(2,26) = 32.91, p<0.001p < 0.001
  • Method B shows highest adjusted mean (84.184.1), followed by A (78.278.2) and C (74.974.9)
  • Partial η2\eta^2 for teaching method = 0.7170.717 (large effect)
Step 4: Pairwise Comparisons
ComparisonMean Diff.p-value
A - B-4.586< 0.0001
A - C0.5690.2665
B - C5.155< 0.0001
Step 5: Conclusion

After controlling for pre-test scores, teaching method B was significantly more effective than both methods A and C. Method. The large effect size (η2=0.904\eta^2 = 0.904) suggests these differences are practically meaningful.

Effect Size

Partial Eta-squared:

ηp2=SStreatmentSStreatment+SSerror\eta^2_p = \frac{SS_{treatment}}{SS_{treatment} + SS_{error}}

Interpretation guidelines:

  • Small effect: 0.01\approx 0.01
  • Medium effect: 0.06\approx 0.06
  • Large effect: 0.14\approx 0.14

Code Examples

R
1# Load required libraries
2library(car)
3library(tidyverse)
4library(emmeans)
5library(effectsize)
6
7# Create data frame
8data <- tibble(
9  Student_ID = 1:30,
10  Teaching_Method = factor(rep(c("A","B","C"), each=10)),
11  Pre_Test = c(78,65,82,73,68,75,70,85,77,60,
12               80,77,85,82,70,65,78,72,68,74,
13               60,72,65,68,75,70,80,67,73,62),
14  Post_Test = c(85,70,88,78,72,80,75,90,82,65,
15                90,85,95,92,80,75,88,82,78,84,
16                65,75,70,72,80,74,85,71,78,68)
17)
18
19data |>
20  group_by(Teaching_Method) |>
21  summarize(mean_pre = mean(Pre_Test), mean_post = mean(Post_Test))
22
23# Fit ANCOVA model
24model <- aov(Post_Test ~ Teaching_Method + Pre_Test, data=data)
25
26# ANCOVA table
27Anova(model)
28
29# Adjusted means
30emmeans_result <- emmeans(model, "Teaching_Method")
31print(emmeans_result)
32
33# Pairwise comparisons
34pairs(emmeans_result, adjust="tukey")
35
36# Effect size
37eta_squared(model, partial=TRUE)
38
39# Checking assumptions
40# Homogeneity of regression slopes
41model_interaction <- aov(Post_Test ~ Teaching_Method * Pre_Test, data=data)
42summary(model_interaction)
43
44# Normality of residuals
45shapiro.test(residuals(model))
46
47# Homogeneity of variances
48leveneTest(Post_Test ~ Teaching_Method, data=data)
49
50ggplot(data, aes(x=Pre_Test, y=Post_Test, color=Teaching_Method)) +
51  geom_point() +
52  geom_smooth(method="lm", se=FALSE) +
53  theme_minimal() +
54  labs(title="Post-Test vs Pre-Test by Teaching Method",
55       x="Pre-Test Score",
56       y="Post-Test Score")
Python
1# Using pandas and statsmodels
2import pandas as pd
3import statsmodels.api as sm
4from statsmodels.formula.api import ols
5from statsmodels.stats.multicomp import pairwise_tukeyhsd
6
7# Read and prepare data
8data = pd.DataFrame({
9   'Student_ID': range(1, 31),
10   'Teaching_Method': ['A']*10 + ['B']*10 + ['C']*10,
11   'Pre_Test': [78,65,82,73,68,75,70,85,77,60,
12                80,77,85,82,70,65,78,72,68,74,
13                60,72,65,68,75,70,80,67,73,62],
14   'Post_Test': [85,70,88,78,72,80,75,90,82,65,
15                 90,85,95,92,80,75,88,82,78,84,
16                 65,75,70,72,80,74,85,71,78,68]
17})
18
19# Fit ANCOVA model
20model = ols('Post_Test ~ C(Teaching_Method) + Pre_Test', data=data).fit()
21
22# Get ANCOVA table
23aov_table = sm.stats.anova_lm(model, typ=2)
24print("ANCOVA Results:")
25print(aov_table)
26
27# Calculate adjusted means
28grand_mean_pretest = data['Pre_Test'].mean()
29adjusted_means = {}
30adjusted_scores = []
31
32# find adjusted means and adjusted scores for each group
33for method in ['A', 'B', 'C']:
34   group_data = data[data['Teaching_Method'] == method]
35   group_mean_pretest = group_data['Pre_Test'].mean()
36   group_mean_posttest = group_data['Post_Test'].mean()
37   beta = model.params['Pre_Test']
38   adj_mean = group_mean_posttest - beta * (group_mean_pretest - grand_mean_pretest)
39   group_adj_scores = group_mean_posttest - beta * (group_data['Pre_Test'] - grand_mean_pretest)
40   adjusted_scores.extend(group_adj_scores)
41   adjusted_means[method] = adj_mean
42
43# Convert adjusted scores to a pandas Series
44adjusted_scores = pd.Series(adjusted_scores)
45
46print("Adjusted Means:")
47for method, mean in adjusted_means.items():
48   print(f"Method {method}: {mean:.2f}")
49
50# Pairwise comparisons using adjusted scores
51posthoc = pairwise_tukeyhsd(adjusted_scores, data['Teaching_Method'])
52print("Pairwise Comparisons (Using Adjusted Scores):")
53print(posthoc)
54
55
56# Effect size (partial eta squared)
57ss_treatment = aov_table.loc['C(Teaching_Method)', 'sum_sq']
58ss_error = aov_table.loc['Residual', 'sum_sq']
59partial_eta_sq = ss_treatment / (ss_treatment + ss_error)
60print(f"Partial η²: {partial_eta_sq:.3f}")

Alternative Approaches

Consider these alternatives:

  • MANCOVA: When there are multiple dependent variables
  • Repeated Measures ANCOVA: For longitudinal data
  • Robust ANCOVA: When assumptions are violated

Related Calculators

One-Way ANOVA Calculator

Two-Way ANOVA Calculator

Three-Way ANOVA Calculator

Simple Linear Regression Calculator

Help us improve

Found an error or have a suggestion? Let us know!