When finding the "line of best fit" between one or more variables and a target, we often turn to our favourite linear regression package. If we're feeling adventurous, we might even write our own gradient descent algorithm. But we can also find a solution using linear algebra. Sounds fun, right?
Right?
Consider a situation where there are two features, $x_1$ and $x_2$, and a target variable, $y$. These could be the hours spent on Instagram per day, the hours on Tik Tok, and the level of brain rot experienced.
Instagram Hours ($x_1$) | Tik Tok Hours ($x_2$) | Brain Rot ($y$) |
---|---|---|
2 | 4 | 29 |
5 | 1 | 15 |
3 | 3 | 24 |
4 | 3 | 27 |
2 | 5 | 34 |
We want to find the relationship between $x_1$, $x_2$ and $y$. For a linear relationship, we would be finding values for $\theta$ in the equation, $$\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2$$ such that the error between $y$ and $\hat{y}$ is minimised.
We first need to set up a system of linear equations, which will take the form,
$$Ax = b$$Writing out the matrices, they would be
Note that a column of 1's has been inserted at the start of matrix $A$ so we can get a value for the intercept, $\theta_0$.
If matrix $A$ was singular/invertible, we could solve the system by multiplying both sides by the inverse of $A$, ie
$$\begin{aligned} Ax &= b \\ A^{-1}Ax &= A^{-1}b \\ x &= A^{-1}b \end{aligned}$$Unfortunately, not every matrix has an inverse. For starters, the matrix would need to be square - and in our example the a dimensions are 5x3. So we'll turn to what's known as the pseudo-inverse.
The Moore-Penrose inverse, often referred to as the pseudo-inverse or generalised inverse, provides an approximation of the inverse of a matrix.
If we again take the system $Ax=b$ and now multiply both sides by $A^T$, we get
$$\begin{aligned} Ax &= b \\ A^T Ax &= A^{T} b \\ x &= (A^T A)^{-1} A^{T} b \\ \end{aligned}$$The term $(A^T A)^{-1} A^T$ is the pseudo-inverse, which is denoted as $A^{\dagger}$, so we can write
$$x = A^{\dagger}b $$We'll run through the example from above using Numpy to calculate this. First, we need to set up the matrices as Numpy arrays.
import numpy as np
A = np.array([
[2,4],
[5,1],
[3,3],
[4,3],
[2,5],
])
# Add in the column of ones to A
A_ = np.append(np.ones((A.shape[0],1)), A, axis=1)
b = np.array([
[29],
[15],
[24],
[27],
[34],
])
If we want to calculate the pseudo-inverse, $A^{\dagger}$, we need to perform $(A^T A)^{-1} A^T$. The @ operator can be used for matrix multiplication for Numpy arrays.
A_pinv = np.linalg.inv((A_.T) @ A_) @ A_.T
print('Pseudo-inverse of A:')
print(np.round(A_pinv,2))
We can now try to find a solution for x, using
$$x = A^{\dagger} b$$x = A_pinv @ b
print(x)
We have found,
$$\begin{aligned} \theta_0 &= -1.4 \\ \theta_1 &= 2.1 \\ \theta_2 &= 6.4 \\ \end{aligned}$$So our least squares solution is,
$$\hat{y} = -1.4 + 2.1x_1 + 6.4x_2$$Numpy provides us with a function for computing the pseudoinverse, numpy.linalg.pinv(). It uses singular-value decomposition (SVD) which is more robust and will also make the code a lot cleaner.
x = np.linalg.pinv(A_) @ b
print(x)
Again, we find the same values as before. We can also check these with a gradient descent algorithm.
from sklearn.linear_model import LinearRegression
reg = LinearRegression().fit(A,b)
print(f'Intercept: {reg.intercept_} \
\nCoefficients: {reg.coef_}')
Perfect, all our values match. We can put it in a function for later use.
def lss(A: np.ndarray, b: np.ndarray) -> np.ndarray:
"""
Least squares solution for the linear system Ax=b.
Returns an array of the coefficients, the first value
is the intercept/bias.
"""
if not all(isinstance(obj, np.ndarray) for obj in [A,b]):
return 'Arguments must be of type np.ndarray'
if A.ndim == 1:
A = A.reshape(-1,1)
A_ = np.append(np.ones((A.shape[0],1)), A, axis=1)
return np.linalg.pinv(A_) @ b