One-Way ANOVA with Python

Welcome to our hands-on tutorial on performing one-way ANOVA (Analysis of Variance) with Python. In this guide, we'll walk through a practical example analyzing how customer engagement varies across three marketing channels: TV, Social Media, and Print advertising. New to ANOVA? Start with our comprehensive ANOVA guide for a solid theoretical foundation before diving into the implementation.

The analysis will be shown using two approaches:

Using statistical libraries (scipy, statsmodels)
Manual calculations with detailed explanations

Looking for R implementation?Check out our One-Way ANOVA with R guide

Sample Data and Assumptions

Note on Assumptions:

For this tutorial, we assume the following ANOVA assumptions are satisfied:

The observations are independent
The data within each group is normally distributed
The groups have homogeneous variances

Dataset:

Csv

Channel, Customer Engagement
TV, 85
TV, 90
TV, 88
TV, 84
TV, 87
Social Media, 88
Social Media, 92
Social Media, 89
Social Media, 85
Social Media, 91
Print, 78
Print, 80
Print, 82
Print, 79
Print, 81

Try it yourself:

Want to analyze this data without coding? Copy the data above and paste it into our One-Way ANOVA Calculator to get instant results.

Approach 1: Using Statistical Libraries

Python Implementation:

Python

import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.api as sm
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import matplotlib.pyplot as plt
import seaborn as sns

# Step 1: Create the dataset
data = {
    'Channel': ['TV', 'TV', 'TV', 'TV', 'TV',
                'Social Media', 'Social Media', 'Social Media', 'Social Media', 'Social Media',
                'Print', 'Print', 'Print', 'Print', 'Print'],
    'Customer_Engagement': [85, 90, 88, 84, 87,
                          88, 92, 89, 85, 91,
                          78, 80, 82, 79, 81]
}

# Step 2: Create a DataFrame
df = pd.DataFrame(data)

# Step 3: Calculate descriptive statistics
descriptive_stats = df.groupby('Channel')['Customer_Engagement'].agg([
    'count', 'mean', 'std', 'min', 'max'
]).round(2)

print("Descriptive Statistics:")
print(descriptive_stats)

# Step 4: Perform one-way ANOVA
# Get the engagement scores for each channel
tv_scores = df[df['Channel'] == 'TV']['Customer_Engagement']
social_scores = df[df['Channel'] == 'Social Media']['Customer_Engagement']
print_scores = df[df['Channel'] == 'Print']['Customer_Engagement']

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(tv_scores, social_scores, print_scores)

print("One-way ANOVA Results:")
print(f"F-statistic: {f_statistic:.4f}")
print(f"p-value: {p_value:.4f}")

# Step 5: Calculate Effect Size (Eta-squared)
def calculate_eta_squared(df, dv, between):
    """Calculate eta-squared effect size"""
    groups = df[between].unique()
    grand_mean = df[dv].mean()
    
    # Calculate SSt (Total Sum of Squares)
    ss_total = np.sum((df[dv] - grand_mean) ** 2)
    
    # Calculate SSb (Between Sum of Squares)
    ss_between = np.sum([
        len(df[df[between] == group]) * 
        (df[df[between] == group][dv].mean() - grand_mean) ** 2
        for group in groups
    ])
    
    return ss_between / ss_total

eta_squared = calculate_eta_squared(df, 'Customer_Engagement', 'Channel')
print(f"Effect Size (η²): {eta_squared:.4f}")

Results:

Descriptive Statistics:

Channel	Mean	Std	N
Print	80.0	1.58	5
Social Media	89.0	2.74	5
TV	86.8	2.39	5

F-statistic: F = 21.0318

p-value: p = 0.0001

Effect Size: η² = 0.7780

Approach 2: Manual Calculations

Python Implementation and Results:

Python

import pandas as pd
import numpy as np

# Step 1: Create the dataset
print("Step 1: Create the dataset")
data = {
    'Channel': ['TV', 'TV', 'TV', 'TV', 'TV',
                'Social Media', 'Social Media', 'Social Media', 'Social Media', 'Social Media',
                'Print', 'Print', 'Print', 'Print', 'Print'],
    'Engagement': [85, 90, 88, 84, 87,
                  88, 92, 89, 85, 91,
                  78, 80, 82, 79, 81]
}
df = pd.DataFrame(data)

# Step 2: Calculate Grand Mean
grand_mean = df['Engagement'].mean()
print(f"Step 2: Grand Mean = {grand_mean:.2f}")

# Step 3: Calculate Group Means
group_means = df.groupby('Channel')['Engagement'].mean()
print("Step 3: Group Means:")
print(group_means)

Step 2: Grand Mean = 85.27
Step 3: Group Means:
Channel
Print           80.0
Social Media    89.0
TV             86.8
Name: Engagement, dtype: float64

Python

# Steps 4-6: Calculate Sum of Squares
n_groups = len(df['Channel'].unique())
n_total = len(df)
n_per_group = df.groupby('Channel').size()

ssb = sum(n_per_group * (group_means - grand_mean)**2)
print(f"Step 4: Sum of Squares Between Groups (SSB) = {ssb:.2f}")

ssw = 0
for channel in df['Channel'].unique():
    group_data = df[df['Channel'] == channel]['Engagement']
    group_mean = group_means[channel]
    ssw += sum((group_data - group_mean)**2)
print(f"Step 5: Sum of Squares Within Groups (SSW) = {ssw:.2f}")

sst = sum((df['Engagement'] - grand_mean)**2)
print(f"Step 6: Total Sum of Squares (SST) = {sst:.2f}")
print(f"Verification: SST ({sst:.2f}) ≈ SSB ({ssb:.2f}) + SSW ({ssw:.2f})")

Step 4: Sum of Squares Between Groups (SSB) = 220.13
Step 5: Sum of Squares Within Groups (SSW) = 62.80
Step 6: Total Sum of Squares (SST) = 282.93
Verification: SST (282.93) ≈ SSB (220.13) + SSW (62.80)

Python

# Steps 7-9: Calculate df, MS, and F-statistic
df_between = n_groups - 1
df_within = n_total - n_groups
df_total = n_total - 1

print(f"Step 7: Degrees of Freedom:")
print(f"Between Groups (df_b) = {df_between}")
print(f"Within Groups (df_w) = {df_within}")
print(f"Total (df_t) = {df_total}")

ms_between = ssb / df_between
ms_within = ssw / df_within

print(f"Step 8: Mean Squares:")
print(f"Mean Square Between (MSB) = {ms_between:.2f}")
print(f"Mean Square Within (MSW) = {ms_within:.2f}")

f_statistic = ms_between / ms_within
print(f"Step 9: F-statistic = {f_statistic:.4f}")

Step 7: Degrees of Freedom:
Between Groups (df_b) = 2
Within Groups (df_w) = 12
Total (df_t) = 14

Step 8: Mean Squares:
Mean Square Between (MSB) = 110.07
Mean Square Within (MSW) = 5.23

Step 9: F-statistic = 21.0318

ANOVA Summary Table:

Source	SS	df	MS	F
Between	220.13	2	110.07	21.0318
Within	62.80	12	5.23	-
Total	282.93	14	-	-

Effect Size: η² = 0.7780

Interpreting the Results:

The one-way ANOVA results show a significant difference in customer engagement across the three marketing channels (TV, Social Media, Print). The F-statistic of 21.0318 and p-value of 0.0001 indicate that the mean customer engagement levels are not equal across the groups. The effect size (η²) of 0.778 suggests that 77.8% of the variance in customer engagement can be explained by the marketing channel.

Post-hoc Analysis:

Conducting post-hoc tests such as Tukey HSD, Bonferroni, Scheffe's test, or Fisher's LSD can help identify which specific groups differ significantly from each other. This is important when the ANOVA results are significant. You can use our Tukey HSD calculator to perform this analysis.

Help us improve

Found an error or have a suggestion? Let us know!