EZ Statistics

Simple Linear Regression

Calculator

2. Select Columns & Options

Learn More

Simple Linear Regression

Definition

Simple Linear Regression models the relationship between a predictor variable (X) and a response variable (Y) using a linear equation. It finds the line that minimizes the sum of squared residuals.

Key Formulas

Regression Line:

Y^=b0+b1X\hat{Y} = b_0 + b_1X

Slope:

b1=(xixˉ)(yiyˉ)(xixˉ)2b_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}

Intercept:

b0=yˉb1xˉb_0 = \bar{y} - b_1\bar{x}

R-squared:

R2=1(yiy^i)2(yiyˉ)2R^2 = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}

Key Assumptions

Linearity: Relationship between X and Y is linear
Independence: Observations are independent
Homoscedasticity: Constant variance of residuals
Normality: Residuals are normally distributed

Practical Example

Step 1: Data
XXYY(XXˉ)(X-\bar{X})(YYˉ)(Y-\bar{Y})(XXˉ)2(X-\bar{X})^2(XXˉ)(YYˉ)(X-\bar{X})(Y-\bar{Y})
12.1-2-3.8247.64
23.8-1-2.1212.12
36.200.2800
47.811.8811.88
59.323.3846.76
Σ=15\Sigma=15Σ=29.2\Sigma=29.2Σ=0\Sigma=0Σ=0\Sigma=0Σ=10\Sigma=10Σ=18.4\Sigma=18.4

Means: Xˉ=3\bar X = 3, Yˉ=5.84\bar Y = 5.84

Step 2: Calculate Slope (b1b_1)
b1=(xixˉ)(yiyˉ)(xixˉ)2=18.410=1.84b_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2} = \frac{18.4}{10} = 1.84
Step 3: Calculate Intercept (b0b_0)
b0=yˉb1xˉ=5.841.84(3)=0.32b_0 = \bar{y} - b_1\bar{x} = 5.84 - 1.84(3) = 0.32
Step 4: Regression Equation
Y^=0.32+1.84X\hat{Y} = 0.32 + 1.84X
Step 5: Calculate R2R^2

R2=0.986R^2 = 0.986 (98.6% of variation in Y explained by X)

Code Examples

R
1library(tidyverse)
2
3# Example data
4data <- tibble(x = c(1, 2, 3, 4, 5), 
5               y = c(2.1, 3.8, 6.2, 7.8, 9.3))
6
7# Fit linear model
8model <- lm(y ~ x, data=data)
9
10# Print summary
11summary(model)
12
13# Get confidence intervals
14confint(model)
15
16# plot with ggplot2
17ggplot(data, aes(x = x, y = y)) +
18  geom_point() +
19  geom_smooth(method = "lm", se = FALSE) +
20  theme_minimal()
Python
1import numpy as np
2import pandas as pd
3import statsmodels.api as sm
4
5# Example data
6X = [1, 2, 3, 4, 5]  # predictor variable
7y = [2.1, 3.8, 6.2, 7.8, 9.3]  # response variable
8
9# Add constant to X for intercept
10X = sm.add_constant(X)
11
12# Fit model
13model = sm.OLS(y, X).fit()
14
15# Print summary
16print(model.summary())
17
18# Get coefficients
19print(f'Intercept: {model.params[0]:.4f}')
20print(f'Slope: {model.params[1]:.4f}')
21print(f'R-squared: {model.rsquared:.4f}')

Alternative Methods

  • Robust Regression: When outliers are present
  • Polynomial Regression: For non-linear relationships
  • Quantile Regression: For heteroscedastic data

Related Links

Correlation Coefficient Calculator

F Distribution Calculator

Two Sample T-Test Calculator

One-Way ANOVA Calculator

Help us improve

Found an error or have a suggestion? Let us know!