ANCOVA (Analysis of Covariance)

Calculator

1. Load Your Data

Need to transform your data?

2. Select Columns & Options

Select Factor Column:

Select Covariate Column:

Select Dependent Variable Column:

Significance Level:

Exclude Outliers

Learn More

Analysis of Covariance (ANCOVA)

Definition

ANCOVA combines ANOVA with regression to analyze differences between group means while controlling for one or more continuous covariates. It is used to compare means of dependent variables across groups, while controlling for the effect of one or more covariates (continuous variables that are not the main focus of the study but could influence the dependent variable). By adjusting for these covariates, ANCOVA provides a more accurate analysis of the group differences by reducing error variance.

Formulas

Sum of Squares Decomposition:

SS_{Total} = SS_{Treatment} + SS_{Covariate} + SS_{Error}

$SS_{Total} = \sum_{i=1}^{n} (Y_i - \bar{Y})^2$ where $\bar{Y}$ is the grand mean

$SS_{Covariate} = \frac{[\sum(X_i - \bar{X})(Y_i - \bar{Y})]^2}{\sum(X_i - \bar{X})^2}$ where $X_i$ is the covariate value and $\bar{X}$ is the covariate mean

$SS_{Treatment(adj)} = \sum n_i(\bar{Y}'_i - \bar{Y}')^2$ where $\bar{Y}'_i$ is the adjusted group mean

$SS_{Error(adj)} = SS_{Total} - SS_{Treatment(adj)} - SS_{Covariate}$

Where:

$SS_{Treatment(adj)}$ = Adjusted Sum of Squares for Treatment, $df = k - 1$ where k is number of groups
$SS_{Covariate}$ = Sum of Squares for Covariate, $df = 1$
$SS_{Error(adj)}$ = Adjusted Error Sum of Squares, $df = N - k - 1$

Adjusted Group Means:

\bar{Y}'_i = \bar{Y}_i - b(\bar{X}_i - \bar{X})

where $b$ is the pooled within-group regression coefficient

F-Statistics:

F_{Treatment} = \frac{MS_{Treatment(adj)}}{MS_{Error(adj)}}

F_{Covariate} = \frac{MS_{Covariate}}{MS_{Error(adj)}}

where $MS = SS/df$ for each source of variation

Key Assumptions

Independence: Observations are independent

Normality: Residuals are normally distributed

Homogeneity of Regression Slopes: Relationship between covariate and dependent variable is similar across all treatment groups (parallel slopes)

Homogeneity of Variances: Equal variances across groups

Linearity: Linear relationship between covariate and dependent variable

Practical Example

Step 1: Data Summary

Method	N	Mean Pre-Test	Mean Post-Test	Adj. Mean
A	10	73.3	78.5	77.72
B	10	75.1	84.9	82.31
C	10	69.2	73.8	77.15

For the raw data, please refer to the R code in the Code Examples section.

Step 2: ANCOVA Results

Source	SS	df	MS	F	p-value
Teaching Method	146.66	2	73.33	122.78	< 0.001
Pre-Test (Covariate)	1259.47	1	1259.47	2108.75	< 0.001
Error	15.53	26	0.597

Step 3: Key Findings

Strong covariate effect (Pre-Test): $F(1,26) = 156.94$ , $p < 0.001$
Significant teaching method effect: $F(2,26) = 32.91$ , $p < 0.001$
Method B shows highest adjusted mean ( $84.1$ ), followed by A ( $78.2$ ) and C ( $74.9$ )
Partial $\eta^2$ for teaching method = $0.717$ (large effect)

Step 4: Pairwise Comparisons

Comparison	Mean Diff.	p-value
A - B	-4.586	< 0.0001
A - C	0.569	0.2665
B - C	5.155	< 0.0001

Step 5: Conclusion

After controlling for pre-test scores, teaching method B was significantly more effective than both methods A and C. Method. The large effect size ( $\eta^2 = 0.904$ ) suggests these differences are practically meaningful.

Effect Size

Partial Eta-squared:

\eta^2_p = \frac{SS_{treatment}}{SS_{treatment} + SS_{error}}

Interpretation guidelines:

Small effect: $\approx 0.01$
Medium effect: $\approx 0.06$
Large effect: $\approx 0.14$

Code Examples

1# Load required libraries
2library(car)
3library(tidyverse)
4library(emmeans)
5library(effectsize)
6
7# Create data frame
8data <- tibble(
9  Student_ID = 1:30,
10  Teaching_Method = factor(rep(c("A","B","C"), each=10)),
11  Pre_Test = c(78,65,82,73,68,75,70,85,77,60,
12               80,77,85,82,70,65,78,72,68,74,
13               60,72,65,68,75,70,80,67,73,62),
14  Post_Test = c(85,70,88,78,72,80,75,90,82,65,
15                90,85,95,92,80,75,88,82,78,84,
16                65,75,70,72,80,74,85,71,78,68)
17)
18
19data |>
20  group_by(Teaching_Method) |>
21  summarize(mean_pre = mean(Pre_Test), mean_post = mean(Post_Test))
22
23# Fit ANCOVA model
24model <- aov(Post_Test ~ Teaching_Method + Pre_Test, data=data)
25
26# ANCOVA table
27Anova(model)
28
29# Adjusted means
30emmeans_result <- emmeans(model, "Teaching_Method")
31print(emmeans_result)
32
33# Pairwise comparisons
34pairs(emmeans_result, adjust="tukey")
35
36# Effect size
37eta_squared(model, partial=TRUE)
38
39# Checking assumptions
40# Homogeneity of regression slopes
41model_interaction <- aov(Post_Test ~ Teaching_Method * Pre_Test, data=data)
42summary(model_interaction)
43
44# Normality of residuals
45shapiro.test(residuals(model))
46
47# Homogeneity of variances
48leveneTest(Post_Test ~ Teaching_Method, data=data)
49
50ggplot(data, aes(x=Pre_Test, y=Post_Test, color=Teaching_Method)) +
51  geom_point() +
52  geom_smooth(method="lm", se=FALSE) +
53  theme_minimal() +
54  labs(title="Post-Test vs Pre-Test by Teaching Method",
55       x="Pre-Test Score",
56       y="Post-Test Score")

Python

1# Using pandas and statsmodels
2import pandas as pd
3import statsmodels.api as sm
4from statsmodels.formula.api import ols
5from statsmodels.stats.multicomp import pairwise_tukeyhsd
6
7# Read and prepare data
8data = pd.DataFrame({
9   'Student_ID': range(1, 31),
10   'Teaching_Method': ['A']*10 + ['B']*10 + ['C']*10,
11   'Pre_Test': [78,65,82,73,68,75,70,85,77,60,
12                80,77,85,82,70,65,78,72,68,74,
13                60,72,65,68,75,70,80,67,73,62],
14   'Post_Test': [85,70,88,78,72,80,75,90,82,65,
15                 90,85,95,92,80,75,88,82,78,84,
16                 65,75,70,72,80,74,85,71,78,68]
17})
18
19# Fit ANCOVA model
20model = ols('Post_Test ~ C(Teaching_Method) + Pre_Test', data=data).fit()
21
22# Get ANCOVA table
23aov_table = sm.stats.anova_lm(model, typ=2)
24print("ANCOVA Results:")
25print(aov_table)
26
27# Calculate adjusted means
28grand_mean_pretest = data['Pre_Test'].mean()
29adjusted_means = {}
30adjusted_scores = []
31
32# find adjusted means and adjusted scores for each group
33for method in ['A', 'B', 'C']:
34   group_data = data[data['Teaching_Method'] == method]
35   group_mean_pretest = group_data['Pre_Test'].mean()
36   group_mean_posttest = group_data['Post_Test'].mean()
37   beta = model.params['Pre_Test']
38   adj_mean = group_mean_posttest - beta * (group_mean_pretest - grand_mean_pretest)
39   group_adj_scores = group_mean_posttest - beta * (group_data['Pre_Test'] - grand_mean_pretest)
40   adjusted_scores.extend(group_adj_scores)
41   adjusted_means[method] = adj_mean
42
43# Convert adjusted scores to a pandas Series
44adjusted_scores = pd.Series(adjusted_scores)
45
46print("Adjusted Means:")
47for method, mean in adjusted_means.items():
48   print(f"Method {method}: {mean:.2f}")
49
50# Pairwise comparisons using adjusted scores
51posthoc = pairwise_tukeyhsd(adjusted_scores, data['Teaching_Method'])
52print("Pairwise Comparisons (Using Adjusted Scores):")
53print(posthoc)
54
55
56# Effect size (partial eta squared)
57ss_treatment = aov_table.loc['C(Teaching_Method)', 'sum_sq']
58ss_error = aov_table.loc['Residual', 'sum_sq']
59partial_eta_sq = ss_treatment / (ss_treatment + ss_error)
60print(f"Partial η²: {partial_eta_sq:.3f}")

Alternative Approaches

Consider these alternatives:

MANCOVA: When there are multiple dependent variables
Repeated Measures ANCOVA: For longitudinal data
Robust ANCOVA: When assumptions are violated

Related Calculators

Help us improve

Found an error or have a suggestion? Let us know!