EZ Statistics

One-Way ANOVA with Python

Welcome to our hands-on tutorial on performing one-way ANOVA (Analysis of Variance) with Python. In this guide, we'll walk through a practical example analyzing how customer engagement varies across three marketing channels: TV, Social Media, and Print advertising. New to ANOVA? Start with our comprehensive ANOVA guide for a solid theoretical foundation before diving into the implementation.

The analysis will be shown using two approaches:

  • Using statistical libraries (scipy, statsmodels)
  • Manual calculations with detailed explanations
Looking for R implementation?Check out our One-Way ANOVA with R guide

Sample Data and Assumptions

Note on Assumptions:

For this tutorial, we assume the following ANOVA assumptions are satisfied:

  • The observations are independent
  • The data within each group is normally distributed
  • The groups have homogeneous variances

Dataset:

Csv
1Channel, Customer Engagement
2TV, 85
3TV, 90
4TV, 88
5TV, 84
6TV, 87
7Social Media, 88
8Social Media, 92
9Social Media, 89
10Social Media, 85
11Social Media, 91
12Print, 78
13Print, 80
14Print, 82
15Print, 79
16Print, 81

Try it yourself:

Want to analyze this data without coding? Copy the data above and paste it into our One-Way ANOVA Calculator to get instant results.

Approach 1: Using Statistical Libraries

Python Implementation:

Python
1import pandas as pd
2import numpy as np
3from scipy import stats
4import statsmodels.api as sm
5from statsmodels.stats.multicomp import pairwise_tukeyhsd
6import matplotlib.pyplot as plt
7import seaborn as sns
8
9# Step 1: Create the dataset
10data = {
11    'Channel': ['TV', 'TV', 'TV', 'TV', 'TV',
12                'Social Media', 'Social Media', 'Social Media', 'Social Media', 'Social Media',
13                'Print', 'Print', 'Print', 'Print', 'Print'],
14    'Customer_Engagement': [85, 90, 88, 84, 87,
15                          88, 92, 89, 85, 91,
16                          78, 80, 82, 79, 81]
17}
18
19# Step 2: Create a DataFrame
20df = pd.DataFrame(data)
21
22# Step 3: Calculate descriptive statistics
23descriptive_stats = df.groupby('Channel')['Customer_Engagement'].agg([
24    'count', 'mean', 'std', 'min', 'max'
25]).round(2)
26
27print("Descriptive Statistics:")
28print(descriptive_stats)
29
30# Step 4: Perform one-way ANOVA
31# Get the engagement scores for each channel
32tv_scores = df[df['Channel'] == 'TV']['Customer_Engagement']
33social_scores = df[df['Channel'] == 'Social Media']['Customer_Engagement']
34print_scores = df[df['Channel'] == 'Print']['Customer_Engagement']
35
36# Perform one-way ANOVA
37f_statistic, p_value = stats.f_oneway(tv_scores, social_scores, print_scores)
38
39print("One-way ANOVA Results:")
40print(f"F-statistic: {f_statistic:.4f}")
41print(f"p-value: {p_value:.4f}")
42
43# Step 5: Calculate Effect Size (Eta-squared)
44def calculate_eta_squared(df, dv, between):
45    """Calculate eta-squared effect size"""
46    groups = df[between].unique()
47    grand_mean = df[dv].mean()
48    
49    # Calculate SSt (Total Sum of Squares)
50    ss_total = np.sum((df[dv] - grand_mean) ** 2)
51    
52    # Calculate SSb (Between Sum of Squares)
53    ss_between = np.sum([
54        len(df[df[between] == group]) * 
55        (df[df[between] == group][dv].mean() - grand_mean) ** 2
56        for group in groups
57    ])
58    
59    return ss_between / ss_total
60
61eta_squared = calculate_eta_squared(df, 'Customer_Engagement', 'Channel')
62print(f"Effect Size (η²): {eta_squared:.4f}")

Results:

Descriptive Statistics:

ChannelMeanStdN
Print80.01.585
Social Media89.02.745
TV86.82.395

F-statistic: F = 21.0318

p-value: p = 0.0001

Effect Size: η² = 0.7780

Approach 2: Manual Calculations

Python Implementation and Results:

Python
1import pandas as pd
2import numpy as np
3
4# Step 1: Create the dataset
5print("Step 1: Create the dataset")
6data = {
7    'Channel': ['TV', 'TV', 'TV', 'TV', 'TV',
8                'Social Media', 'Social Media', 'Social Media', 'Social Media', 'Social Media',
9                'Print', 'Print', 'Print', 'Print', 'Print'],
10    'Engagement': [85, 90, 88, 84, 87,
11                  88, 92, 89, 85, 91,
12                  78, 80, 82, 79, 81]
13}
14df = pd.DataFrame(data)
15
16# Step 2: Calculate Grand Mean
17grand_mean = df['Engagement'].mean()
18print(f"Step 2: Grand Mean = {grand_mean:.2f}")
19
20# Step 3: Calculate Group Means
21group_means = df.groupby('Channel')['Engagement'].mean()
22print("Step 3: Group Means:")
23print(group_means)

Step 2: Grand Mean = 85.27
Step 3: Group Means:
Channel
Print           80.0
Social Media    89.0
TV             86.8
Name: Engagement, dtype: float64

Python
1# Steps 4-6: Calculate Sum of Squares
2n_groups = len(df['Channel'].unique())
3n_total = len(df)
4n_per_group = df.groupby('Channel').size()
5
6ssb = sum(n_per_group * (group_means - grand_mean)**2)
7print(f"Step 4: Sum of Squares Between Groups (SSB) = {ssb:.2f}")
8
9ssw = 0
10for channel in df['Channel'].unique():
11    group_data = df[df['Channel'] == channel]['Engagement']
12    group_mean = group_means[channel]
13    ssw += sum((group_data - group_mean)**2)
14print(f"Step 5: Sum of Squares Within Groups (SSW) = {ssw:.2f}")
15
16sst = sum((df['Engagement'] - grand_mean)**2)
17print(f"Step 6: Total Sum of Squares (SST) = {sst:.2f}")
18print(f"Verification: SST ({sst:.2f}) ≈ SSB ({ssb:.2f}) + SSW ({ssw:.2f})")

Step 4: Sum of Squares Between Groups (SSB) = 220.13
Step 5: Sum of Squares Within Groups (SSW) = 62.80
Step 6: Total Sum of Squares (SST) = 282.93
Verification: SST (282.93) ≈ SSB (220.13) + SSW (62.80)

Python
1# Steps 7-9: Calculate df, MS, and F-statistic
2df_between = n_groups - 1
3df_within = n_total - n_groups
4df_total = n_total - 1
5
6print(f"Step 7: Degrees of Freedom:")
7print(f"Between Groups (df_b) = {df_between}")
8print(f"Within Groups (df_w) = {df_within}")
9print(f"Total (df_t) = {df_total}")
10
11ms_between = ssb / df_between
12ms_within = ssw / df_within
13
14print(f"Step 8: Mean Squares:")
15print(f"Mean Square Between (MSB) = {ms_between:.2f}")
16print(f"Mean Square Within (MSW) = {ms_within:.2f}")
17
18f_statistic = ms_between / ms_within
19print(f"Step 9: F-statistic = {f_statistic:.4f}")

Step 7: Degrees of Freedom:
Between Groups (df_b) = 2
Within Groups (df_w) = 12
Total (df_t) = 14

Step 8: Mean Squares:
Mean Square Between (MSB) = 110.07
Mean Square Within (MSW) = 5.23

Step 9: F-statistic = 21.0318

ANOVA Summary Table:

SourceSSdfMSF
Between220.132110.0721.0318
Within62.80125.23-
Total282.9314--

Effect Size: η² = 0.7780

Interpreting the Results:

The one-way ANOVA results show a significant difference in customer engagement across the three marketing channels (TV, Social Media, Print). The F-statistic of 21.0318 and p-value of 0.0001 indicate that the mean customer engagement levels are not equal across the groups. The effect size (η²) of 0.778 suggests that 77.8% of the variance in customer engagement can be explained by the marketing channel.

Post-hoc Analysis:

Conducting post-hoc tests such as Tukey HSD, Bonferroni, Scheffe's test, or Fisher's LSD can help identify which specific groups differ significantly from each other. This is important when the ANOVA results are significant. You can use our Tukey HSD calculator to perform this analysis.

Help us improve

Found an error or have a suggestion? Let us know!