EZ Statistics

Two-Sample T-Test

Calculator

2. Select Columns & Options

Learn More

Two-Sample T-Test (Student's T-Test or Welch's T-Test)

Definition

Two-Sample T-Test is a statistical test used to determine whether there is a significant difference between the means of two independent groups. It's particularly useful when comparing two different treatments, methods, or groups to each other.

Formula

Test Statistic:

t=xˉ1xˉ2s12n1+s22n2t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

Degrees of freedom:

For equal variances (Student's t-test):

df=n1+n22df = n_1 + n_2 - 2

For unequal variances (Welch's t-test):

df=(s12n1+s22n2)2(s12/n1)2n11+(s22/n2)2n21df = \frac{(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2})^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}

Confidence Interval:

CI=(xˉ1xˉ2)±tα/2SECI = (\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2} \cdot SE

Standard Error (SE) for equal variances:

SE=sp1n1+1n2SE = s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}

where pooled standard deviation:

sp=(n11)s12+(n21)s22n1+n22s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}

Standard Error (SE) for unequal variances:

SE=s12n1+s22n2SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

Where:

  • xˉ1,xˉ2\bar{x}_1, \bar{x}_2 = sample means
  • s12,s22s_1^2, s_2^2 = sample variances
  • n1,n2n_1, n_2 = sample sizes
  • tα/2t_{\alpha/2} = critical value from t-distribution
  • α\alpha = significance level

Welch's T-Test vs. Student's T-Test

While both Welch's t-test and Student's t-test are used to compare means between two groups, they differ in their assumptions and applications:

AspectWelch's T-TestStudent's T-Test
Variance AssumptionDoes not assume equal variancesAssumes equal variances
Degrees of FreedomCalculated using Welch–Satterthwaite equation aboven1+n22n_1 + n_2 - 2
RobustnessMore robust when variances are unequalLess robust when variances are unequal
Sample Size SensitivityLess sensitive to unequal sample sizesMore sensitive to unequal sample sizes
Use CasePreferred when variances or sample sizes are unequalUsed when variances are assumed to be equal

Key Distinction: The primary difference lies in the assumption of equal variances. Welch's t-test does not require this assumption, making it more appropriate for comparing groups with unequal variances.

Both tests share the following assumptions:

  • Independence: Observations in each sample should be independent.
  • Normality: Data should be approximately normally distributed (though both tests are somewhat robust to violations of this assumption, especially for larger sample sizes).
  • Random Sampling: Samples should be randomly selected from their respective populations.

In practice, Welch's t-test is often recommended as the default choice for comparing two means, as it maintains good control over Type I error rates and statistical power across a wider range of scenarios compared to Student's t-test.

Practical Example

We want to compare two teaching methods by examining test scores:

Given Data:

  • Method A: xˉ1=75\bar{x}_1 = 75, s1=8s_1 = 8, n1=30n_1 = 30
  • Method B: xˉ2=70\bar{x}_2 = 70, s2=10s_2 = 10, n2=35n_2 = 35
  • Assume equal variances: σ1=σ2\sigma_1 = \sigma_2
  • α=0.05\alpha = 0.05 (two-tailed test)

Hypotheses:

Null Hypothesis (H0H_0): μ1=μ2\mu_1 = \mu_2 (no difference between methods)

Alternative Hypothesis (H1H_1): μ1μ2\mu_1 \neq \mu_2 (there is a difference between methods)

Step-by-Step Calculation:

  1. Calculate standard errors:SE1=830=1.46SE_1 = \frac{8}{\sqrt{30}} = 1.46SE2=1035=1.69SE_2 = \frac{10}{\sqrt{35}} = 1.69
  2. Calculate combined standard error: SE=SE12+SE22=1.462+1.692=2.23SE = \sqrt{SE_1^2 + SE_2^2} = \sqrt{1.46^2 + 1.69^2} = 2.23
  3. Calculate t-statistic: t=75702.23=2.24t = \frac{75 - 70}{2.23} = 2.24
  4. Calculate degrees of freedom (Welch): df=62.4df = 62.4
  5. Find critical value: t0.025=±2.00t_{0.025} = \pm 2.00
  6. Construct confidence interval: CI=(7570)±2.002.23CI = (75 - 70) \pm 2.00 \cdot 2.23CI=5±4.46=(0.54,9.46)CI = 5 \pm 4.46 = (0.54, 9.46)

Conclusion:

2.24>2.00|2.24| > 2.00, we reject the null hypothesis. There is sufficient evidence to conclude that there is a significant difference between the two teaching methods (p<0.05p < 0.05). We are 95% confident that the true difference in means lies between 0.54 and 9.46.

Effect Size

Cohen's d for two independent samples:

d=xˉ1xˉ2spooledd = \frac{|\bar{x}_1 - \bar{x}_2|}{s_{pooled}}

For unequal variances (preferred when using Welch's t-test):

d=xˉ1xˉ2s12+s222d = \frac{|\bar{x}_1 - \bar{x}_2|}{\sqrt{\frac{s_1^2 + s_2^2}{2}}}

Interpretation guidelines:

  • Small effect: |d| ≈ 0.2
  • Medium effect: |d| ≈ 0.5
  • Large effect: |d| ≈ 0.8

Power Analysis

To determine required sample size per group (n) for desired power (1-β):

n=2((z1α/2+z1β)σμ1μ2)2n = 2\left(\frac{(z_{1-\alpha/2} + z_{1-\beta})\sigma}{|\mu_1-\mu_2|}\right)^2

Where:

  • α\alpha = significance level
  • β\beta = probability of Type II error
  • μ1μ2\mu_1-\mu_2 = minimum detectable difference
  • σ\sigma = population standard deviation

Decision Rules

Reject H0H_0 if:

  • Two-sided test: t>tcritical|t| > t_{critical}
  • Left-tailed test: t<tcriticalt < -t_{critical}
  • Right-tailed test: t>tcriticalt > t_{critical}
  • Or if p-value<αp\text{-value} < \alpha

Where tcriticalt_{critical} is:

  • tα/2,dft_{\alpha/2,df} for two-sided tests
  • tα,dft_{\alpha,df} for one-sided tests
  • dfdf calculated using appropriate formula for Student's or Welch's test

Reporting Results

Standard format for scientific reporting:

"An independent-samples t-test was conducted to compare [variable] between [group1] and [group2]. A [significant/non-significant] difference was found between [group1] (M = [mean1], SD = [sd1], n = [n1]) and [group2] (M = [mean2], SD = [sd2], n = [n2]); t([df]) = [t-value], p = [p-value], d = [Cohen's d]. The 95% CI for the difference in means ranged from [lower] to [upper]."

Remember to report whether Welch's or Student's t-test was used and justify the choice based on the equality of variances.

Code Examples

R
1library(tidyverse)
2library(car)
3library(effsize)
4
5set.seed(42)
6group1 <- rnorm(30, mean = 75, sd = 8)  # Method A
7group2 <- rnorm(35, mean = 70, sd = 10) # Method B
8
9# Combine data
10data <- tibble(
11  score = c(group1, group2),
12  method = factor(c(rep("A", 30), rep("B", 35)))
13)
14
15# Basic summary statistics
16summary_stats <- data %>%
17  group_by(method) %>%
18  summarise(
19    n = n(),
20    mean = mean(score),
21    sd = sd(score)
22  )
23
24# Levene's test for equality of variances
25car::leveneTest(score ~ method, data = data)
26
27# Welch's t-test (default)
28t_test_result <- t.test(score ~ method, data = data)
29
30# Student's t-test (if equal variances assumed)
31t_test_equal_var <- t.test(score ~ method, data = data, var.equal = TRUE)
32
33# Effect size
34cohens_d <- effsize::cohen.d(score ~ method, data = data)
35
36# Visualization
37ggplot(data, aes(x = method, y = score, fill = method)) +
38  geom_boxplot(alpha = 0.5) +
39  geom_jitter(width = 0.2, alpha = 0.5) +
40  theme_minimal() +
41  labs(title = "Comparison of Test Scores by Method")
Python
1import numpy as np
2import scipy.stats as stats
3import matplotlib.pyplot as plt
4import seaborn as sns
5from statsmodels.stats.power import TTestIndPower
6
7# Generate sample data
8np.random.seed(42)
9group1 = np.random.normal(75, 8, 30)  # Method A
10group2 = np.random.normal(70, 10, 35) # Method B
11
12# Create a DataFrame for easier plotting with seaborn
13import pandas as pd
14df = pd.DataFrame({
15    'Score': np.concatenate([group1, group2]),
16    'Method': ['A']*30 + ['B']*35
17})
18
19# Basic summary statistics
20def get_summary(data):
21    return {
22        'n': len(data),
23        'mean': np.mean(data),
24        'std': np.std(data, ddof=1),
25        'se': stats.sem(data)
26    }
27
28summary1 = get_summary(group1)
29summary2 = get_summary(group2)
30
31# Test for equal variances
32_, levene_p = stats.levene(group1, group2)
33
34# Perform t-tests
35# Welch's t-test (unequal variances)
36t_stat, p_value = stats.ttest_ind(group1, group2, equal_var=False)
37
38# Calculate Cohen's d
39pooled_sd = np.sqrt((summary1['std']**2 + summary2['std']**2) / 2)
40cohens_d = abs(summary1['mean'] - summary2['mean']) / pooled_sd
41
42# Create visualization
43plt.figure(figsize=(12, 5))
44
45# Subplot 1: Boxplot
46plt.subplot(1, 2, 1)
47sns.boxplot(data=df, x='Method', y='Score')
48plt.title('Score Distribution by Method')
49
50# Subplot 2: Distribution
51plt.subplot(1, 2, 2)
52sns.histplot(data=df, x='Score', hue='Method', element="step", 
53            stat="density", common_norm=False)
54plt.title('Score Distribution Density')
55
56plt.tight_layout()
57plt.show()
58
59# Print results
60print("Summary Statistics:")
61print(f"Method A: Mean = {summary1['mean']:.2f}, SD = {summary1['std']:.2f}, n = {summary1['n']}")
62print(f"Method B: Mean = {summary2['mean']:.2f}, SD = {summary2['std']:.2f}, n = {summary2['n']}")
63print(f"Levene's test p-value: {levene_p:.4f}")
64print(f"Welch's t-test: t = {t_stat:.4f}, p = {p_value:.4f}")
65print(f"Cohen's d: {cohens_d:.4f}")

Alternative Tests

Consider these alternatives when assumptions are violated:

  • Mann-Whitney U Test: When normality is violated or data is ordinal
  • Paired t-test: When samples are dependent/matched

Related Calculators

One-Sample T-Test Calculator

Paired T-Test Calculator

One-Way ANOVA Calculator

Z-Score Calculator

Help us improve

Found an error or have a suggestion? Let us know!