F Distribution Calculator
Calculator
Parameters
Distribution Chart
Click Calculate to view the distribution chart
Learn More
F Distribution: Definition, Formula, and Applications
F Distribution
Definition: The F distribution (also known as Fisher-Snedecor distribution) is a continuous probability distribution used to compare variances and test hypotheses in analysis of variance (ANOVA). It is the ratio of two chi-square distributions divided by their respective degrees of freedom.
Formula:The probability density function (PDF) is given by:where:is the beta function, and the parameters are:
- is the degrees of freedom for numerator
- is the degrees of freedom for denominator
- is the gamma function
Properties
Key Statistics:
- Mean: for
- Variance: for
- Mode: for
Key Properties:
- Always non-negative (defined for x ≥ 0)
- Right-skewed distribution
- Approaches normal distribution for large degrees of freedom
- Related to ratio of chi-square distributions
Applications
1. Analysis of Variance (ANOVA)
The F distribution is fundamental in ANOVA for:
- Testing equality of means across multiple groups
- Assessing treatment effects in experimental design
- Comparing nested statistical models
2. Regression Analysis
Used in regression analysis for:
- Testing overall significance of regression models
- Comparing nested regression models
- Testing groups of coefficients
3. Variance Comparisons
Applied in comparing variances for:
- Testing homogeneity of variances
- Quality control processes
- Measurement system analysis
4. Statistical Process Control
Used in quality control for:
- Process capability analysis
- Control chart construction
- Variance component analysis
R Code Example
library(tidyverse)
# Parameters
df1 <- 4 # numerator degrees of freedom
df2 <- 10 # denominator degrees of freedom
# Calculate probability between two values
x1 <- 0.5
x2 <- 2.5
prob <- pf(x2, df1 = df1, df2 = df2) - pf(x1, df1 = df1, df2 = df2)
print(str_glue("P({x1} < X < {x2}) = {round(prob, 4)}"))
# Create plot
x <- seq(0, 5, length.out = 1000)
y <- df(x, df1 = df1, df2 = df2)
df <- tibble(x = x, y = y)
ggplot(df, aes(x = x, y = y)) +
geom_line(color = "blue") +
geom_area(data = subset(df, x >= x1 & x <= x2),
aes(x = x, y = y),
fill = "blue",
alpha = 0.2) +
labs(title = str_glue("F Distribution (df1 = {df1}, df2 = {df2})"),
x = "x",
y = "Probability Density",
caption = str_glue("P({x1} < X < {x2}) = {round(prob, 4)}")) +
theme_minimal()
Python Code Example
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
# Set parameters
df1 = 4 # numerator degrees of freedom
df2 = 10 # denominator degrees of freedom
# Calculate probability between two values
x1, x2 = 0.5, 2.5
prob = stats.f.cdf(x2, df1, df2) - stats.f.cdf(x1, df1, df2)
print(f"P({x1} < X < {x2}) = {prob:.4f}")
# Create plot
x = np.linspace(0, 5, 1000)
pdf = stats.f.pdf(x, df1, df2)
plt.figure(figsize=(10, 6))
plt.plot(x, pdf, 'blue', label='PDF')
# Add shaded area
x_shade = x[(x >= x1) & (x <= x2)]
pdf_shade = stats.f.pdf(x_shade, df1, df2)
plt.fill_between(x_shade, pdf_shade, alpha=0.2, color='blue')
# Customize plot
plt.title(f'F Distribution (df1 = {df1}, df2 = {df2})')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.annotate(f'P({x1} < X < {x2}) = {prob:.4f}',
xy=(2.5, max(pdf)/2),
xytext=(2.5, max(pdf)/2))
plt.grid(True, alpha=0.3)
plt.show()