One-Way ANOVA with Python
Welcome to our hands-on tutorial on performing one-way ANOVA (Analysis of Variance) with Python. In this guide, we'll walk through a practical example analyzing how customer engagement varies across three marketing channels: TV, Social Media, and Print advertising. New to ANOVA? Start with our comprehensive ANOVA guide for a solid theoretical foundation before diving into the implementation.
The analysis will be shown using two approaches:
- Using statistical libraries (scipy, statsmodels)
- Manual calculations with detailed explanations
Sample Data and Assumptions
Note on Assumptions:
For this tutorial, we assume the following ANOVA assumptions are satisfied:
- The observations are independent
- The data within each group is normally distributed
- The groups have homogeneous variances
Dataset:
1Channel, Customer Engagement
2TV, 85
3TV, 90
4TV, 88
5TV, 84
6TV, 87
7Social Media, 88
8Social Media, 92
9Social Media, 89
10Social Media, 85
11Social Media, 91
12Print, 78
13Print, 80
14Print, 82
15Print, 79
16Print, 81
Try it yourself:
Want to analyze this data without coding? Copy the data above and paste it into our One-Way ANOVA Calculator to get instant results.
Approach 1: Using Statistical Libraries
Python Implementation:
1import pandas as pd
2import numpy as np
3from scipy import stats
4import statsmodels.api as sm
5from statsmodels.stats.multicomp import pairwise_tukeyhsd
6import matplotlib.pyplot as plt
7import seaborn as sns
8
9# Step 1: Create the dataset
10data = {
11 'Channel': ['TV', 'TV', 'TV', 'TV', 'TV',
12 'Social Media', 'Social Media', 'Social Media', 'Social Media', 'Social Media',
13 'Print', 'Print', 'Print', 'Print', 'Print'],
14 'Customer_Engagement': [85, 90, 88, 84, 87,
15 88, 92, 89, 85, 91,
16 78, 80, 82, 79, 81]
17}
18
19# Step 2: Create a DataFrame
20df = pd.DataFrame(data)
21
22# Step 3: Calculate descriptive statistics
23descriptive_stats = df.groupby('Channel')['Customer_Engagement'].agg([
24 'count', 'mean', 'std', 'min', 'max'
25]).round(2)
26
27print("Descriptive Statistics:")
28print(descriptive_stats)
29
30# Step 4: Perform one-way ANOVA
31# Get the engagement scores for each channel
32tv_scores = df[df['Channel'] == 'TV']['Customer_Engagement']
33social_scores = df[df['Channel'] == 'Social Media']['Customer_Engagement']
34print_scores = df[df['Channel'] == 'Print']['Customer_Engagement']
35
36# Perform one-way ANOVA
37f_statistic, p_value = stats.f_oneway(tv_scores, social_scores, print_scores)
38
39print("One-way ANOVA Results:")
40print(f"F-statistic: {f_statistic:.4f}")
41print(f"p-value: {p_value:.4f}")
42
43# Step 5: Calculate Effect Size (Eta-squared)
44def calculate_eta_squared(df, dv, between):
45 """Calculate eta-squared effect size"""
46 groups = df[between].unique()
47 grand_mean = df[dv].mean()
48
49 # Calculate SSt (Total Sum of Squares)
50 ss_total = np.sum((df[dv] - grand_mean) ** 2)
51
52 # Calculate SSb (Between Sum of Squares)
53 ss_between = np.sum([
54 len(df[df[between] == group]) *
55 (df[df[between] == group][dv].mean() - grand_mean) ** 2
56 for group in groups
57 ])
58
59 return ss_between / ss_total
60
61eta_squared = calculate_eta_squared(df, 'Customer_Engagement', 'Channel')
62print(f"Effect Size (η²): {eta_squared:.4f}")
Results:
Descriptive Statistics:
Channel | Mean | Std | N |
---|---|---|---|
80.0 | 1.58 | 5 | |
Social Media | 89.0 | 2.74 | 5 |
TV | 86.8 | 2.39 | 5 |
F-statistic: F = 21.0318
p-value: p = 0.0001
Effect Size: η² = 0.7780
Approach 2: Manual Calculations
Python Implementation and Results:
1import pandas as pd
2import numpy as np
3
4# Step 1: Create the dataset
5print("Step 1: Create the dataset")
6data = {
7 'Channel': ['TV', 'TV', 'TV', 'TV', 'TV',
8 'Social Media', 'Social Media', 'Social Media', 'Social Media', 'Social Media',
9 'Print', 'Print', 'Print', 'Print', 'Print'],
10 'Engagement': [85, 90, 88, 84, 87,
11 88, 92, 89, 85, 91,
12 78, 80, 82, 79, 81]
13}
14df = pd.DataFrame(data)
15
16# Step 2: Calculate Grand Mean
17grand_mean = df['Engagement'].mean()
18print(f"Step 2: Grand Mean = {grand_mean:.2f}")
19
20# Step 3: Calculate Group Means
21group_means = df.groupby('Channel')['Engagement'].mean()
22print("Step 3: Group Means:")
23print(group_means)
Step 2: Grand Mean = 85.27
Step 3: Group Means:
Channel
Print 80.0
Social Media 89.0
TV 86.8
Name: Engagement, dtype: float64
1# Steps 4-6: Calculate Sum of Squares
2n_groups = len(df['Channel'].unique())
3n_total = len(df)
4n_per_group = df.groupby('Channel').size()
5
6ssb = sum(n_per_group * (group_means - grand_mean)**2)
7print(f"Step 4: Sum of Squares Between Groups (SSB) = {ssb:.2f}")
8
9ssw = 0
10for channel in df['Channel'].unique():
11 group_data = df[df['Channel'] == channel]['Engagement']
12 group_mean = group_means[channel]
13 ssw += sum((group_data - group_mean)**2)
14print(f"Step 5: Sum of Squares Within Groups (SSW) = {ssw:.2f}")
15
16sst = sum((df['Engagement'] - grand_mean)**2)
17print(f"Step 6: Total Sum of Squares (SST) = {sst:.2f}")
18print(f"Verification: SST ({sst:.2f}) ≈ SSB ({ssb:.2f}) + SSW ({ssw:.2f})")
Step 4: Sum of Squares Between Groups (SSB) = 220.13
Step 5: Sum of Squares Within Groups (SSW) = 62.80
Step 6: Total Sum of Squares (SST) = 282.93
Verification: SST (282.93) ≈ SSB (220.13) + SSW (62.80)
1# Steps 7-9: Calculate df, MS, and F-statistic
2df_between = n_groups - 1
3df_within = n_total - n_groups
4df_total = n_total - 1
5
6print(f"Step 7: Degrees of Freedom:")
7print(f"Between Groups (df_b) = {df_between}")
8print(f"Within Groups (df_w) = {df_within}")
9print(f"Total (df_t) = {df_total}")
10
11ms_between = ssb / df_between
12ms_within = ssw / df_within
13
14print(f"Step 8: Mean Squares:")
15print(f"Mean Square Between (MSB) = {ms_between:.2f}")
16print(f"Mean Square Within (MSW) = {ms_within:.2f}")
17
18f_statistic = ms_between / ms_within
19print(f"Step 9: F-statistic = {f_statistic:.4f}")
Step 7: Degrees of Freedom:
Between Groups (df_b) = 2
Within Groups (df_w) = 12
Total (df_t) = 14
Step 8: Mean Squares:
Mean Square Between (MSB) = 110.07
Mean Square Within (MSW) = 5.23
Step 9: F-statistic = 21.0318
ANOVA Summary Table:
Source | SS | df | MS | F |
---|---|---|---|---|
Between | 220.13 | 2 | 110.07 | 21.0318 |
Within | 62.80 | 12 | 5.23 | - |
Total | 282.93 | 14 | - | - |
Effect Size: η² = 0.7780
Interpreting the Results:
The one-way ANOVA results show a significant difference in customer engagement across the three marketing channels (TV, Social Media, Print). The F-statistic of 21.0318 and p-value of 0.0001 indicate that the mean customer engagement levels are not equal across the groups. The effect size (η²) of 0.778 suggests that 77.8% of the variance in customer engagement can be explained by the marketing channel.
Post-hoc Analysis:
Conducting post-hoc tests such as Tukey HSD, Bonferroni, Scheffe's test, or Fisher's LSD can help identify which specific groups differ significantly from each other. This is important when the ANOVA results are significant. You can use our Tukey HSD calculator to perform this analysis.
Help us improve
Found an error or have a suggestion? Let us know!