Skip to content

Day 10

What is Linear Regression 💡 💡

Referring

Linear Regression is a simple and powerful model for predicting a numeric response from a set of one or more independent variables. This article will focus mostly on how the method is used in machine learning, so we won't cover common use cases like causal inference or experimental design. And although it may seem like linear regression is overlooked in modern machine learning's ever-increasing world of complex neural network architectures, the algorithm is still widely used across a large number of domains because it is effective, easy to interpret, and easy to extend. The key ideas in linear regression are recycled everywhere, so understanding the algorithm is a must-have for a strong foundation in machine learning.

Let's Be More Specific

Linear regression is a supervised algorithm that learns to model a dependent variable, y, as a function of some independent variables (aka "features"), xi​, by finding a line (or surface) that best "fits" the data. In general, we assume y to be some number and each xi can be basically anything. For example: predicting the price of a house using the number of rooms in that house (y : price, x1 ​: number of rooms) or predicting weight from height and age (y : weight, x1 : height, x2 : age).

In general, the equation for linear regression is

y = β0 ​+ β1 ​x1 ​ + β2​x2 ​+ ... + βp​xp​ + ϵ

where:

  • y: the dependent variable; the thing we are trying to predict.
  • xi: the independent variables: the features our model uses to model y.
  • βi​: the coefficients (aka "weights") of our regression model. These are the foundations of our model. They are what our model "learns" during optimization.[ℹ]
  • ϵ: the irreducible error in our model. A term that collects together all the unmodeled parts of our data.

Fitting a linear regression model is all about finding the set of cofficients that best model y as a function of our features. We may never know the true parameters for our model, but we can estimate them (more on this later). Once we've estimated these coefficients, βi^ ​, we predict future values, y^, as:

y^ ​= β0^ ​​+ β1^x1 ​+ β2^x2 ​+ ..... + βp^xp

So predicting future values (often called inference), is as simple as plugging the values of our features xixi​ into our equation!