Multiple Linear Regression
Calculator
Learn More
Multiple Linear Regression
Definition
Multiple Linear Regression models the relationship between a dependent variable and two or more independent variables, assuming a linear relationship. It extends simple linear regression to account for multiple predictors.
Model Equation
Where:
- = dependent variable
- = independent variables
- = intercept
- = regression coefficients
- = error term
Key Formulas:
Sum of Squares:
Where is the predicted value and is the mean
R-squared:
Adjusted R-squared:
Key Assumptions
Linearity: Linear relationship between variables
Independence: Independent residuals
Homoscedasticity: Constant variance of residuals
Normality: Normal distribution of residuals
No Multicollinearity: Independent variables not highly correlated
Practical Example
Step 1: State the Data
Housing prices model:
House | Price (K) | Sqft | Age | Bedrooms |
---|---|---|---|---|
1 | 300 | 1500 | 15 | 3 |
2 | 250 | 1200 | 20 | 2 |
3 | 400 | 2000 | 10 | 4 |
4 | 550 | 2400 | 5 | 4 |
5 | 317 | 1600 | 12 | 3 |
6 | 389 | 1800 | 8 | 3 |
Step 2: Calculate Matrix Operations
Design matrix X:
Coefficients calculation:
Step 3: Model Results
Fitted equation:
- R² = 0.892
- Adjusted R² = 0.821
- F-statistic = 13.78 (p-value = 0.014)
Step 4: Interpretation
- For each additional square foot, price increases by $210
- Each year of age decreases price by $3,150
- Each additional bedroom adds $25,300 to price
- Model explains 89.2% of price variation
Model Diagnostics
Key diagnostic measures:
- VIF (Variance Inflation Factor):
- Residual Standard Error:
Code Examples
R
1library(tidyverse)
2library(broom)
3
4# Example data
5data <- tibble(
6 price = c(300, 250, 400, 550, 317, 389),
7 sqft = c(1500, 1200, 2000, 2400, 1600, 1800),
8 age = c(15, 20, 10, 5, 12, 8),
9 bedrooms = c(3, 2, 4, 4, 3, 3)
10)
11
12# Fit model
13model <- lm(price ~ sqft + age + bedrooms, data = data)
14
15# Model summary
16tidy(model) # Coefficients
17glance(model) # Model statistics
18
19par(mfrow = c(2, 2)) # Arrange plots in a 2x2 grid
20plot(model)
21
22# Predictions
23new_data <- tibble(
24 sqft = 1800,
25 age = 10,
26 bedrooms = 3
27)
28pred = predict(model, new_data)
29print(str_glue("Predicted price: {pred}"))
Python
1import pandas as pd
2import numpy as np
3from statsmodels.formula.api import ols
4import statsmodels.api as sm
5
6# Example data
7df = pd.DataFrame({
8 'price': [300, 250, 400, 550, 317, 389],
9 'sqft': [1500, 1200, 2000, 2400, 1600, 1800],
10 'age': [15, 20, 10, 5, 12, 8],
11 'bedrooms': [3, 2, 4, 4, 3, 3]
12})
13
14# Fit the model
15model = ols('price ~ sqft + age + bedrooms', data=df).fit()
16
17# Print summary
18print(model.summary())
19
20# For just coefficients and R-squared
21print("Coefficients:")
22print(model.params)
23print("R-squared:", model.rsquared)
24
25# Predictions
26X_new = pd.DataFrame({
27 'sqft': [1800],
28 'age': [10],
29 'bedrooms': [3]
30})
31predictions = model.predict(X_new)
32print("Predicted price:", predictions[0])
Alternative Methods
Consider these alternatives:
- Ridge Regression: For handling multicollinearity
- Lasso Regression: For feature selection
- Polynomial Regression: For non-linear relationships
Related Links
Simple Linear Regression
Correlation Coefficient Calculator
Help us improve
Found an error or have a suggestion? Let us know!