EZ Statistics

Multiple Linear Regression

Calculator

2. Select Variables & Options

Learn More

Multiple Linear Regression

Definition

Multiple Linear Regression models the relationship between a dependent variable and two or more independent variables, assuming a linear relationship. It extends simple linear regression to account for multiple predictors.

Model Equation

y=β0+β1x1+β2x2+...+βkxk+ϵy = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_kx_k + \epsilon

Where:

  • yy = dependent variable
  • xix_i = independent variables
  • β0\beta_0 = intercept
  • βi\beta_i = regression coefficients
  • ϵ\epsilon = error term

Key Formulas:

Sum of Squares:

SST=(yiyˉ)2SST = \sum(y_i - \bar{y})^2SSR=(y^iyˉ)2SSR = \sum(\hat{y}_i - \bar{y})^2SSE=(yiy^i)2SSE = \sum(y_i - \hat{y}_i)^2

Where y^i\hat{y}_i is the predicted value and yˉ\bar{y} is the mean

R-squared:

R2=SSRSST=1SSESSTR^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST}

Adjusted R-squared:

Radj2=1(1R2)n1nk1R^2_{adj} = 1 - (1-R^2)\frac{n-1}{n-k-1}

Key Assumptions

Linearity: Linear relationship between variables
Independence: Independent residuals
Homoscedasticity: Constant variance of residuals
Normality: Normal distribution of residuals
No Multicollinearity: Independent variables not highly correlated

Practical Example

Step 1: State the Data

Housing prices model:

HousePrice (K)SqftAgeBedrooms
13001500153
22501200202
34002000104
4550240054
53171600123
6389180083
Step 2: Calculate Matrix Operations

Design matrix X:

X=[11500153112002021180083]\mathbf{X} = \begin{bmatrix} 1 & 1500 & 15 & 3 \\ 1 & 1200 & 20 & 2 \\ \vdots & \vdots & \vdots & \vdots \\ 1 & 1800 & 8 & 3 \end{bmatrix}

Coefficients calculation:

β^=(XTX)1XTy\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}
Step 3: Model Results

Fitted equation:

y^=50.2+0.21xsqft3.15xage+25.3xbedrooms\hat{y} = -50.2 + 0.21x_{sqft} - 3.15x_{age} + 25.3x_{bedrooms}
  • R² = 0.892
  • Adjusted R² = 0.821
  • F-statistic = 13.78 (p-value = 0.014)
Step 4: Interpretation
  • For each additional square foot, price increases by $210
  • Each year of age decreases price by $3,150
  • Each additional bedroom adds $25,300 to price
  • Model explains 89.2% of price variation

Model Diagnostics

Key diagnostic measures:

  • VIF (Variance Inflation Factor):
    VIFj=11Rj2VIF_j = \frac{1}{1-R^2_j}
  • Residual Standard Error:
    RSE=SSEnk1RSE = \sqrt{\frac{SSE}{n-k-1}}

Code Examples

R
1library(tidyverse)
2library(broom)
3
4# Example data
5data <- tibble(
6  price = c(300, 250, 400, 550, 317, 389),
7  sqft = c(1500, 1200, 2000, 2400, 1600, 1800),
8  age = c(15, 20, 10, 5, 12, 8),
9  bedrooms = c(3, 2, 4, 4, 3, 3)
10)
11
12# Fit model
13model <- lm(price ~ sqft + age + bedrooms, data = data)
14
15# Model summary
16tidy(model)      # Coefficients
17glance(model)    # Model statistics
18
19par(mfrow = c(2, 2))  # Arrange plots in a 2x2 grid
20plot(model)
21
22# Predictions
23new_data <- tibble(
24  sqft = 1800,
25  age = 10,
26  bedrooms = 3
27)
28pred = predict(model, new_data)
29print(str_glue("Predicted price: {pred}"))
Python
1import pandas as pd
2import numpy as np
3from statsmodels.formula.api import ols
4import statsmodels.api as sm
5
6# Example data
7df = pd.DataFrame({
8    'price': [300, 250, 400, 550, 317, 389],
9    'sqft': [1500, 1200, 2000, 2400, 1600, 1800],
10    'age': [15, 20, 10, 5, 12, 8],
11    'bedrooms': [3, 2, 4, 4, 3, 3]
12})
13
14# Fit the model
15model = ols('price ~ sqft + age + bedrooms', data=df).fit()
16
17# Print summary
18print(model.summary())
19
20# For just coefficients and R-squared
21print("Coefficients:")
22print(model.params)
23print("R-squared:", model.rsquared)
24
25# Predictions
26X_new = pd.DataFrame({
27    'sqft': [1800],
28    'age': [10],
29    'bedrooms': [3]
30})
31predictions = model.predict(X_new)
32print("Predicted price:", predictions[0])

Alternative Methods

Consider these alternatives:

  • Ridge Regression: For handling multicollinearity
  • Lasso Regression: For feature selection
  • Polynomial Regression: For non-linear relationships

Related Links

Simple Linear Regression

Correlation Coefficient Calculator

Help us improve

Found an error or have a suggestion? Let us know!