Two-Sample Z-Test

Calculator

1. Load Your Data

Need to transform your data?

2. Select Columns & Options

Select column for Sample 1:

Select column for Sample 2:

Population Standard Deviation (Sample 1):

Population Standard Deviation (Sample 2):

Significance Level:

Alternative Hypothesis:

Exclude Outliers

Learn More

Two-Sample Z-Test

Definition

Two-Sample Z-Test is a statistical test used to determine whether the means of two populations are significantly different from each other when both population standard deviations are known. It's particularly useful for large samples and when working with known population parameters.

Formula

Test Statistic:

z = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}

Where:

$\bar x_1, \bar x_2$ = sample means
$\mu_1, \mu_2$ = population means
$\sigma_1, \sigma_2$ = known population standard deviations
$n_1, n_2$ = sample sizes

Confidence Interval for Mean Difference:

\text{Two-sided: }(\bar{x}_1 - \bar{x}_2) \pm z_{\alpha/2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}

\text{One-sided: }(\bar{x}_1 - \bar{x}_2) \pm z_{\alpha} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}

Where

z_{\alpha/2}

and

z_{\alpha}

are the critical values for the two-tailed and one-tailed tests, respectively.

Key Assumptions

Known Population Standard Deviations: Both σ₁ and σ₂ must be known

Independent Samples: The two samples must be independent

Random Sampling: Both samples should be randomly selected

Normality: Both populations should be normal or have large samples (

n > 30

)

Practical Example

Comparing the efficiency of two production lines with known process variations:

Step 1: State the Data

Line 1: $n_1$ = 50, $\bar x_1$ = 95.2 units/hour, $\sigma_1$ = 4.0
Line 2: $n_2$ = 45, $\bar x_2$ = 93.8 units/hour, $\sigma_2$ = 3.8

Step 2: State Hypotheses

$H_0: \mu_1 - \mu_2 = 0$ (no difference)
$H_a: \mu_1 - \mu_2 \neq 0$ (there is a difference)
$\alpha = 0.05$

Step 3: Calculate Test Statistic

Z-statistic:

z = \frac{(95.2 - 93.8) - 0}{\sqrt{\frac{4.0^2}{50} + \frac{3.8^2}{45}}} = 1.73

Step 4: Calculate P-value

For two-tailed test:

p\text{-value} = 2(1 - \Phi(|z|)) = 2(1 - \Phi(1.73)) = 0.084

Step 5: Calculate Confidence Interval

\text{CI: }(95.2 - 93.8) \pm 1.96 \sqrt{\frac{4.0^2}{50} + \frac{3.8^2}{45}} = [0.7, 2.3]

Step 6: Draw Conclusion

Critical value at 5% significance level:

z_{\alpha/2} = 1.96

Since $|z| < z_{\alpha/2}$ and $p\text{-value} > 0.05$ , we fail to reject $H_0$ . There is no significant difference between the two production lines.

Effect Size

Cohen's d for two-sample z-test:

d = \frac{|\bar{x}_1 - \bar{x}_2|}{\sqrt{\frac{\sigma_1^2 + \sigma_2^2}{2}}}

Interpretation guidelines:

Small effect: $|d| \approx 0.2$
Medium effect: $|d| \approx 0.5$
Large effect: $|d| \approx 0.8$

Power Analysis

Required sample size per group for equal sample sizes:

\text{Two-sided: } n = \frac{2(z_{1-\alpha/2} + z_{1-\beta})^2(\sigma_1^2 + \sigma_2^2)}{(\mu_1-\mu_2)^2}

\text{One-sided: } n = \frac{2(z_{1-\alpha} + z_{1-\beta})^2(\sigma_1^2 + \sigma_2^2)}{(\mu_1-\mu_2)^2}

Where:

$\alpha$ = significance level
$\beta$ = probability of Type II error
$\mu_1 - \mu_2$ = minimum detectable difference

Decision Rules

Reject $H_0$ if:

Two-sided test: $|z| > z_{\alpha/2}$
Left-tailed test: $z < -z_{\alpha}$
Right-tailed test: $z > z_{\alpha}$
Or if $p\text{-value} < \alpha$

Reporting Results

Standard format:

"A two-sample z-test was conducted to compare [variable] between [group 1] (M₁ = [mean₁], σ₁ = [std₁], n₁ = [n₁]) and [group 2] (M₂ = [mean₂], σ₂ = [std₂], n₂ = [n₂]). Results indicated [significant/no significant] difference between the groups, z = [z-value], p = [p-value], d = [Cohen's d]."

Code Examples

1library(tidyverse)
2
3set.seed(42)
4# Production Line 1 data (known σ₁ = 4.0)
5line1 <- tibble(
6  line = "Line 1",
7  units = rnorm(50, mean = 95.2, sd = 4.0)
8)
9
10# Production Line 2 data (known σ₂ = 3.8)
11line2 <- tibble(
12  line = "Line 2",
13  units = rnorm(45, mean = 93.8, sd = 3.8)
14)
15
16# Combine data
17production_data <- bind_rows(line1, line2)
18
19# Summarize the data
20summary_stats <- production_data |>
21  group_by(line) |>
22  summarise(
23    n = n(),
24    mean = mean(units),
25   ".groups" = "drop"
26  ) |>
27  mutate(known_sd = if_else(line == "Line 1", line1_pop_sd, line2_pop_sd))
28
29# Perform two-sample Z-test
30line1_stats <- summary_stats |> filter(line == "Line 1")
31line2_stats <- summary_stats |> filter(line == "Line 2")
32
33# Calculate z-statistic
34z_stat <- (line1_stats$mean - line2_stats$mean) / sqrt((line1_stats$known_sd^2 / line1_stats$n) + (line2_stats$known_sd^2 / line2_stats$n))
35print(str_glue("Z-statistic: {round(z_stat, 3)}"))
36
37# 95% confidence interval
38alpha <- 0.05
39z_alpha <- qnorm(1 - alpha/2)
40mean_diff <- line1_stats$mean - line2_stats$mean
41margin_of_error <- z_alpha * sqrt((line1_stats$known_sd^2 / line1_stats$n) + (line2_stats$known_sd^2 / line2_stats$n))
42ci_lower <- mean_diff - margin_of_error
43ci_upper <- mean_diff + margin_of_error
44print(str_glue("95% CI: [{round(ci_lower, 2)}, {round(ci_upper, 2)}]")
45
46# Calculate p-value (two-sided test)
47p_value <- 2 * (1 - pnorm(abs(z_stat)))
48print(str_glue("P-value: {round(p_value, 4)}")
49
50# Calculate effect size (Cohen's d)
51pooled_sd <- sqrt((4.0^2 + 3.8^2) / 2)
52cohens_d <- abs(line1_stats$mean - line2_stats$mean) / pooled_sd
53print(str_glue("Effect size (Cohen's d): {round(cohens_d, 3)}"
54
55
56# Visualization
57ggplot(production_data, aes(x = line, y = units, fill = line)) +
58  geom_boxplot(alpha = 0.5) +
59  geom_jitter(width = 0.2, alpha = 0.5) +
60  theme_minimal() +
61  labs(
62    title = "Production Output by Line",
63    y = "Units per Hour",
64    x = "Production Line"
65  )

Python

1import numpy as np
2import scipy.stats as stats
3import pandas as pd
4import matplotlib.pyplot as plt
5import seaborn as sns
6
7# Set random seed for reproducibility
8np.random.seed(42)
9
10# Generate sample data
11# Production Line 1 (known σ₁ = 4.0)
12line1_data = np.random.normal(95.2, 40, 50)
13
14# Production Line 2 (known σ₂ = 3.8)
15line2_data = np.random.normal(93.8, 3.8, 45)
16
17# Calculate sample means
18sample_mean1 = np.mean(line1_data)
19sample_mean2 = np.mean(line2_data)
20
21# Calculate z-statistic
22z_numerator = (sample_mean1 - sample_mean2)
23z_denominator = np.sqrt((4.0**2/50) + (3.8**2/45))
24z_stat = z_numerator / z_denominator
25print(f"Z-statistic: {z_stat:.2f}")
26
27# Calculate p-value (two-sided test)
28p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))
29print(f"P-value: {p_value:.4f}")
30
31# Calculate 95% Confidence Interval
32alpha = 0.05
33z_critical = stats.norm.ppf(1 - alpha/2)
34margin_of_error = z_critical * z_denominator
35ci_lower = z_numerator - margin_of_error
36ci_upper = z_numerator + margin_of_error
37print(f"95% Confidence Interval for mean difference: ({ci_lower:.2f}, {ci_upper:.2f})")
38
39# Calculate effect size (Cohen's d)
40pooled_sd = np.sqrt((4.0**2 + 3.8**2) / 2)
41cohens_d = abs(sample_mean1 - sample_mean2) / pooled_sd
42print(f"Cohen's d: {cohens_d:.2f}")
43
44# Create DataFrame for plotting
45df = pd.DataFrame({
46    'Production Line': ['Line 1']*50 + ['Line 2']*45,
47    'Units': np.concatenate([line1_data, line2_data])
48})
49
50# Create visualization
51plt.figure(figsize=(12, 5))
52
53# Subplot 1: Boxplot
54plt.subplot(1, 2, 1)
55sns.boxplot(data=df, x='Production Line', y='Units')
56plt.title('Units per Hour by Production Line')
57
58# Subplot 2: Distribution
59plt.subplot(1, 2, 2)
60sns.histplot(data=df, x='Units', hue='Production Line',
61            element="step", stat="density")
62plt.title('Distribution of Units per Hour')
63
64plt.tight_layout()
65plt.show()

Related Calculators

Help us improve

Found an error or have a suggestion? Let us know!

Two-Sample Z-Test

Calculator

1. Load Your Data

2. Select Columns & Options

Learn More

Two-Sample Z-Test

Definition

Formula

Key Assumptions

Practical Example

Step 1: State the Data

Step 2: State Hypotheses

Step 3: Calculate Test Statistic

Step 4: Calculate P-value

Step 5: Calculate Confidence Interval

Step 6: Draw Conclusion

Effect Size

Power Analysis

Decision Rules

Reporting Results

Code Examples

Related Calculators

One-Sample Z-Test Calculator

Two-Sample T-Test Calculator

Chi-Square Test of Independence Calculator

One-Way ANOVA Calculator

Help us improve