Chi-Square Distribution Calculator
Calculator
Parameters
Distribution Chart
Learn More
Chi-Square Distribution: Definition, Formula, and Applications
Chi-Square Distribution
Definition: The chi-square distribution is a probability distribution of a sum of squared standard normal random variables. It is widely used in statistical inference, particularly for tests of independence and goodness-of-fit tests.
Where:
- is the degrees of freedom (shape parameter)
- is the value of the chi-square statistic
Properties
- Mean: (equals degrees of freedom)
- Variance:
- Mode:
- Support:
- Special cases:
- The sum of standard normal variables squared
- is the square of a standard normal
- For large , approaches normal distribution
Applications
1. Goodness of Fit Tests
Chi-square tests are used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. This test helps determine if a sample comes from a population with a specific distribution.
2. Tests of Independence
In contingency table analysis, chi-square tests help determine whether there is a significant relationship between two categorical variables. This is crucial in fields like social sciences, market research, and medical studies.
3. Quality Control
In manufacturing and quality control, chi-square tests can be used to monitor process variability and ensure that production processes remain within acceptable limits.
4. Medical Research
The chi-square distribution is used in medical studies to analyze categorical data and assess relationships between various factors, such as treatment outcomes and patient characteristics.
R Code Example
library(tidyverse)
# Parameters
df <- 4 # degrees of freedom
# Calculate probability between two values
x1 <- 2
x2 <- 8
prob <- pchisq(x2, df = df) - pchisq(x1, df = df)
print(str_glue("P({x1} < X < {x2}) = {round(prob, 4)}"))
# Create plot
x <- seq(0, 15, length.out = 1000)
y <- dchisq(x, df = df)
data <- tibble(x = x, y = y)
ggplot(data, aes(x = x, y = y)) +
geom_line(color = "blue") +
geom_area(data = subset(data, x >= x1 & x <= x2),
aes(x = x, y = y),
fill = "blue",
alpha = 0.2) +
labs(title = str_glue("Chi-Square Distribution (df = {df})"),
x = "x",
y = "Probability Density",
caption = str_glue("P({x1} < X < {x2}) = {round(prob, 4)}")) +
theme_minimal()
Python Code Example
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
# Set parameters
df = 4 # degrees of freedom
# Calculate probability between two values
x1, x2 = 2, 8
prob = stats.chi2.cdf(x2, df) - stats.chi2.cdf(x1, df)
print(f"P({x1} < X < {x2}) = {prob:.4f}")
# Create plot
x = np.linspace(0, 15, 1000)
pdf = stats.chi2.pdf(x, df)
plt.figure(figsize=(10, 6))
plt.plot(x, pdf, 'blue', label='PDF')
# Add shaded area
x_shade = x[(x >= x1) & (x <= x2)]
pdf_shade = stats.chi2.pdf(x_shade, df)
plt.fill_between(x_shade, pdf_shade, alpha=0.2, color='blue')
# Customize plot
plt.title(f'Chi-Square Distribution (df = {df})')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.annotate(f'P({x1} < X < {x2}) = {prob:.4f}',
xy=(8, max(pdf)/2),
xytext=(8, max(pdf)/2))
plt.grid(True, alpha=0.3)
plt.show()