Two-Sample T-Test

Calculator

1. Load Your Data

Need to transform your data?

2. Select Columns & Options

Select first column:

Select second column:

Significance Level:

Alternative Hypothesis:

Exclude Outliers

Assume Equal Variances

Learn More

Two-Sample T-Test (Student's T-Test or Welch's T-Test)

Definition

Two-Sample T-Test is a statistical test used to determine whether there is a significant difference between the means of two independent groups. It's particularly useful when comparing two different treatments, methods, or groups to each other.

Formula

Test Statistic:

t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

Degrees of freedom:

For equal variances (Student's t-test):

df = n_1 + n_2 - 2

For unequal variances (Welch's t-test):

df = \frac{(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2})^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}

Confidence Interval:

CI = (\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2} \cdot SE

Standard Error (SE) for equal variances:

SE = s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}

where pooled standard deviation:

s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}

Standard Error (SE) for unequal variances:

SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

Where:

$\bar{x}_1, \bar{x}_2$ = sample means
$s_1^2, s_2^2$ = sample variances
$n_1, n_2$ = sample sizes
$t_{\alpha/2}$ = critical value from t-distribution
$\alpha$ = significance level

Welch's T-Test vs. Student's T-Test

While both Welch's t-test and Student's t-test are used to compare means between two groups, they differ in their assumptions and applications:

Aspect	Welch's T-Test	Student's T-Test
Variance Assumption	Does not assume equal variances	Assumes equal variances
Degrees of Freedom	Calculated using Welch–Satterthwaite equation above	$n_1 + n_2 - 2$
Robustness	More robust when variances are unequal	Less robust when variances are unequal
Sample Size Sensitivity	Less sensitive to unequal sample sizes	More sensitive to unequal sample sizes
Use Case	Preferred when variances or sample sizes are unequal	Used when variances are assumed to be equal

Key Distinction: The primary difference lies in the assumption of equal variances. Welch's t-test does not require this assumption, making it more appropriate for comparing groups with unequal variances.

Both tests share the following assumptions:

Independence: Observations in each sample should be independent.
Normality: Data should be approximately normally distributed (though both tests are somewhat robust to violations of this assumption, especially for larger sample sizes).
Random Sampling: Samples should be randomly selected from their respective populations.

In practice, Welch's t-test is often recommended as the default choice for comparing two means, as it maintains good control over Type I error rates and statistical power across a wider range of scenarios compared to Student's t-test.

Practical Example

We want to compare two teaching methods by examining test scores:

Given Data:

Method A: $\bar{x}_1 = 75$ , $s_1 = 8$ , $n_1 = 30$
Method B: $\bar{x}_2 = 70$ , $s_2 = 10$ , $n_2 = 35$
Assume equal variances: $\sigma_1 = \sigma_2$
$\alpha = 0.05$ (two-tailed test)

Hypotheses:

Null Hypothesis ( $H_0$ ): $\mu_1 = \mu_2$ (no difference between methods)

Alternative Hypothesis ( $H_1$ ): $\mu_1 \neq \mu_2$ (there is a difference between methods)

Step-by-Step Calculation:

Calculate standard errors: $SE_1 = \frac{8}{\sqrt{30}} = 1.46$ $SE_2 = \frac{10}{\sqrt{35}} = 1.69$
Calculate combined standard error: $SE = \sqrt{SE_1^2 + SE_2^2} = \sqrt{1.46^2 + 1.69^2} = 2.23$
Calculate t-statistic: $t = \frac{75 - 70}{2.23} = 2.24$
Calculate degrees of freedom (Welch): $df = 62.4$
Find critical value: $t_{0.025} = \pm 2.00$
Construct confidence interval: $CI = (75 - 70) \pm 2.00 \cdot 2.23$ $CI = 5 \pm 4.46 = (0.54, 9.46)$

Conclusion:

$|2.24| > 2.00$ , we reject the null hypothesis. There is sufficient evidence to conclude that there is a significant difference between the two teaching methods ( $p < 0.05$ ). We are 95% confident that the true difference in means lies between 0.54 and 9.46.

Effect Size

Cohen's d for two independent samples:

d = \frac{|\bar{x}_1 - \bar{x}_2|}{s_{pooled}}

For unequal variances (preferred when using Welch's t-test):

d = \frac{|\bar{x}_1 - \bar{x}_2|}{\sqrt{\frac{s_1^2 + s_2^2}{2}}}

Interpretation guidelines:

Small effect: |d| ≈ 0.2
Medium effect: |d| ≈ 0.5
Large effect: |d| ≈ 0.8

Power Analysis

To determine required sample size per group (n) for desired power (1-β):

n = 2\left(\frac{(z_{1-\alpha/2} + z_{1-\beta})\sigma}{|\mu_1-\mu_2|}\right)^2

Where:

$\alpha$ = significance level
$\beta$ = probability of Type II error
$\mu_1-\mu_2$ = minimum detectable difference
$\sigma$ = population standard deviation

Decision Rules

Reject $H_0$ if:

Two-sided test: $|t| > t_{critical}$
Left-tailed test: $t < -t_{critical}$
Right-tailed test: $t > t_{critical}$
Or if $p\text{-value} < \alpha$

Where $t_{critical}$ is:

$t_{\alpha/2,df}$ for two-sided tests
$t_{\alpha,df}$ for one-sided tests
$df$ calculated using appropriate formula for Student's or Welch's test

Reporting Results

Standard format for scientific reporting:

"An independent-samples t-test was conducted to compare [variable] between [group1] and [group2]. A [significant/non-significant] difference was found between [group1] (M = [mean1], SD = [sd1], n = [n1]) and [group2] (M = [mean2], SD = [sd2], n = [n2]); t([df]) = [t-value], p = [p-value], d = [Cohen's d]. The 95% CI for the difference in means ranged from [lower] to [upper]."

Remember to report whether Welch's or Student's t-test was used and justify the choice based on the equality of variances.

Code Examples

1library(tidyverse)
2library(car)
3library(effsize)
4
5set.seed(42)
6group1 <- rnorm(30, mean = 75, sd = 8)  # Method A
7group2 <- rnorm(35, mean = 70, sd = 10) # Method B
8
9# Combine data
10data <- tibble(
11  score = c(group1, group2),
12  method = factor(c(rep("A", 30), rep("B", 35)))
13)
14
15# Basic summary statistics
16summary_stats <- data %>%
17  group_by(method) %>%
18  summarise(
19    n = n(),
20    mean = mean(score),
21    sd = sd(score)
22  )
23
24# Levene's test for equality of variances
25car::leveneTest(score ~ method, data = data)
26
27# Welch's t-test (default)
28t_test_result <- t.test(score ~ method, data = data)
29
30# Student's t-test (if equal variances assumed)
31t_test_equal_var <- t.test(score ~ method, data = data, var.equal = TRUE)
32
33# Effect size
34cohens_d <- effsize::cohen.d(score ~ method, data = data)
35
36# Visualization
37ggplot(data, aes(x = method, y = score, fill = method)) +
38  geom_boxplot(alpha = 0.5) +
39  geom_jitter(width = 0.2, alpha = 0.5) +
40  theme_minimal() +
41  labs(title = "Comparison of Test Scores by Method")

Python

1import numpy as np
2import scipy.stats as stats
3import matplotlib.pyplot as plt
4import seaborn as sns
5from statsmodels.stats.power import TTestIndPower
6
7# Generate sample data
8np.random.seed(42)
9group1 = np.random.normal(75, 8, 30)  # Method A
10group2 = np.random.normal(70, 10, 35) # Method B
11
12# Create a DataFrame for easier plotting with seaborn
13import pandas as pd
14df = pd.DataFrame({
15    'Score': np.concatenate([group1, group2]),
16    'Method': ['A']*30 + ['B']*35
17})
18
19# Basic summary statistics
20def get_summary(data):
21    return {
22        'n': len(data),
23        'mean': np.mean(data),
24        'std': np.std(data, ddof=1),
25        'se': stats.sem(data)
26    }
27
28summary1 = get_summary(group1)
29summary2 = get_summary(group2)
30
31# Test for equal variances
32_, levene_p = stats.levene(group1, group2)
33
34# Perform t-tests
35# Welch's t-test (unequal variances)
36t_stat, p_value = stats.ttest_ind(group1, group2, equal_var=False)
37
38# Calculate Cohen's d
39pooled_sd = np.sqrt((summary1['std']**2 + summary2['std']**2) / 2)
40cohens_d = abs(summary1['mean'] - summary2['mean']) / pooled_sd
41
42# Create visualization
43plt.figure(figsize=(12, 5))
44
45# Subplot 1: Boxplot
46plt.subplot(1, 2, 1)
47sns.boxplot(data=df, x='Method', y='Score')
48plt.title('Score Distribution by Method')
49
50# Subplot 2: Distribution
51plt.subplot(1, 2, 2)
52sns.histplot(data=df, x='Score', hue='Method', element="step", 
53            stat="density", common_norm=False)
54plt.title('Score Distribution Density')
55
56plt.tight_layout()
57plt.show()
58
59# Print results
60print("Summary Statistics:")
61print(f"Method A: Mean = {summary1['mean']:.2f}, SD = {summary1['std']:.2f}, n = {summary1['n']}")
62print(f"Method B: Mean = {summary2['mean']:.2f}, SD = {summary2['std']:.2f}, n = {summary2['n']}")
63print(f"Levene's test p-value: {levene_p:.4f}")
64print(f"Welch's t-test: t = {t_stat:.4f}, p = {p_value:.4f}")
65print(f"Cohen's d: {cohens_d:.4f}")

Alternative Tests

Consider these alternatives when assumptions are violated:

Mann-Whitney U Test: When normality is violated or data is ordinal
Paired t-test: When samples are dependent/matched

Related Calculators

Help us improve

Found an error or have a suggestion? Let us know!