Two-Sample T-Test
Calculator
Learn More
Two-Sample T-Test (Student's T-Test or Welch's T-Test)
Definition
Two-Sample T-Test is a statistical test used to determine whether there is a significant difference between the means of two independent groups. It's particularly useful when comparing two different treatments, methods, or groups to each other.
Formula
Test Statistic:
Degrees of freedom:
For equal variances (Student's t-test):
For unequal variances (Welch's t-test):
Confidence Interval:
Standard Error (SE) for equal variances:
where pooled standard deviation:
Standard Error (SE) for unequal variances:
Where:
- = sample means
- = sample variances
- = sample sizes
- = critical value from t-distribution
- = significance level
Welch's T-Test vs. Student's T-Test
While both Welch's t-test and Student's t-test are used to compare means between two groups, they differ in their assumptions and applications:
Aspect | Welch's T-Test | Student's T-Test |
---|---|---|
Variance Assumption | Does not assume equal variances | Assumes equal variances |
Degrees of Freedom | Calculated using Welch–Satterthwaite equation above | |
Robustness | More robust when variances are unequal | Less robust when variances are unequal |
Sample Size Sensitivity | Less sensitive to unequal sample sizes | More sensitive to unequal sample sizes |
Use Case | Preferred when variances or sample sizes are unequal | Used when variances are assumed to be equal |
Key Distinction: The primary difference lies in the assumption of equal variances. Welch's t-test does not require this assumption, making it more appropriate for comparing groups with unequal variances.
Both tests share the following assumptions:
- Independence: Observations in each sample should be independent.
- Normality: Data should be approximately normally distributed (though both tests are somewhat robust to violations of this assumption, especially for larger sample sizes).
- Random Sampling: Samples should be randomly selected from their respective populations.
In practice, Welch's t-test is often recommended as the default choice for comparing two means, as it maintains good control over Type I error rates and statistical power across a wider range of scenarios compared to Student's t-test.
Practical Example
We want to compare two teaching methods by examining test scores:
Given Data:
- Method A: , ,
- Method B: , ,
- Assume equal variances:
- (two-tailed test)
Hypotheses:
Null Hypothesis (): (no difference between methods)
Alternative Hypothesis (): (there is a difference between methods)
Step-by-Step Calculation:
- Calculate standard errors:
- Calculate combined standard error:
- Calculate t-statistic:
- Calculate degrees of freedom (Welch):
- Find critical value:
- Construct confidence interval:
Conclusion:
, we reject the null hypothesis. There is sufficient evidence to conclude that there is a significant difference between the two teaching methods (). We are 95% confident that the true difference in means lies between 0.54 and 9.46.
Effect Size
Cohen's d for two independent samples:
For unequal variances (preferred when using Welch's t-test):
Interpretation guidelines:
- Small effect: |d| ≈ 0.2
- Medium effect: |d| ≈ 0.5
- Large effect: |d| ≈ 0.8
Power Analysis
To determine required sample size per group (n) for desired power (1-β):
Where:
- = significance level
- = probability of Type II error
- = minimum detectable difference
- = population standard deviation
Decision Rules
Reject if:
- Two-sided test:
- Left-tailed test:
- Right-tailed test:
- Or if
Where is:
- for two-sided tests
- for one-sided tests
- calculated using appropriate formula for Student's or Welch's test
Reporting Results
Standard format for scientific reporting:
Remember to report whether Welch's or Student's t-test was used and justify the choice based on the equality of variances.
Code Examples
1library(tidyverse)
2library(car)
3library(effsize)
4
5set.seed(42)
6group1 <- rnorm(30, mean = 75, sd = 8) # Method A
7group2 <- rnorm(35, mean = 70, sd = 10) # Method B
8
9# Combine data
10data <- tibble(
11 score = c(group1, group2),
12 method = factor(c(rep("A", 30), rep("B", 35)))
13)
14
15# Basic summary statistics
16summary_stats <- data %>%
17 group_by(method) %>%
18 summarise(
19 n = n(),
20 mean = mean(score),
21 sd = sd(score)
22 )
23
24# Levene's test for equality of variances
25car::leveneTest(score ~ method, data = data)
26
27# Welch's t-test (default)
28t_test_result <- t.test(score ~ method, data = data)
29
30# Student's t-test (if equal variances assumed)
31t_test_equal_var <- t.test(score ~ method, data = data, var.equal = TRUE)
32
33# Effect size
34cohens_d <- effsize::cohen.d(score ~ method, data = data)
35
36# Visualization
37ggplot(data, aes(x = method, y = score, fill = method)) +
38 geom_boxplot(alpha = 0.5) +
39 geom_jitter(width = 0.2, alpha = 0.5) +
40 theme_minimal() +
41 labs(title = "Comparison of Test Scores by Method")
1import numpy as np
2import scipy.stats as stats
3import matplotlib.pyplot as plt
4import seaborn as sns
5from statsmodels.stats.power import TTestIndPower
6
7# Generate sample data
8np.random.seed(42)
9group1 = np.random.normal(75, 8, 30) # Method A
10group2 = np.random.normal(70, 10, 35) # Method B
11
12# Create a DataFrame for easier plotting with seaborn
13import pandas as pd
14df = pd.DataFrame({
15 'Score': np.concatenate([group1, group2]),
16 'Method': ['A']*30 + ['B']*35
17})
18
19# Basic summary statistics
20def get_summary(data):
21 return {
22 'n': len(data),
23 'mean': np.mean(data),
24 'std': np.std(data, ddof=1),
25 'se': stats.sem(data)
26 }
27
28summary1 = get_summary(group1)
29summary2 = get_summary(group2)
30
31# Test for equal variances
32_, levene_p = stats.levene(group1, group2)
33
34# Perform t-tests
35# Welch's t-test (unequal variances)
36t_stat, p_value = stats.ttest_ind(group1, group2, equal_var=False)
37
38# Calculate Cohen's d
39pooled_sd = np.sqrt((summary1['std']**2 + summary2['std']**2) / 2)
40cohens_d = abs(summary1['mean'] - summary2['mean']) / pooled_sd
41
42# Create visualization
43plt.figure(figsize=(12, 5))
44
45# Subplot 1: Boxplot
46plt.subplot(1, 2, 1)
47sns.boxplot(data=df, x='Method', y='Score')
48plt.title('Score Distribution by Method')
49
50# Subplot 2: Distribution
51plt.subplot(1, 2, 2)
52sns.histplot(data=df, x='Score', hue='Method', element="step",
53 stat="density", common_norm=False)
54plt.title('Score Distribution Density')
55
56plt.tight_layout()
57plt.show()
58
59# Print results
60print("Summary Statistics:")
61print(f"Method A: Mean = {summary1['mean']:.2f}, SD = {summary1['std']:.2f}, n = {summary1['n']}")
62print(f"Method B: Mean = {summary2['mean']:.2f}, SD = {summary2['std']:.2f}, n = {summary2['n']}")
63print(f"Levene's test p-value: {levene_p:.4f}")
64print(f"Welch's t-test: t = {t_stat:.4f}, p = {p_value:.4f}")
65print(f"Cohen's d: {cohens_d:.4f}")
Alternative Tests
Consider these alternatives when assumptions are violated:
- Mann-Whitney U Test: When normality is violated or data is ordinal
- Paired t-test: When samples are dependent/matched
Related Calculators
One-Sample T-Test Calculator
Paired T-Test Calculator
One-Way ANOVA Calculator
Z-Score Calculator
Help us improve
Found an error or have a suggestion? Let us know!