Least Squares Solution Using the Moore-Penrose Inverse

When finding the "line of best fit" between one or more variables and a target, we often turn to our favourite linear regression package. If we're feeling adventurous, we might even write our own gradient descent algorithm. But we can also find a solution using linear algebra. Sounds fun, right?

Right?

Example

Consider a situation where there are two features, $x_1$ and $x_2$, and a target variable, $y$. These could be the hours spent on Instagram per day, the hours on Tik Tok, and the level of brain rot experienced.

Instagram Hours ($x_1$) Tik Tok Hours ($x_2$) Brain Rot ($y$)
2 4 29
5 1 15
3 3 24
4 3 27
2 5 34

We want to find the relationship between $x_1$, $x_2$ and $y$. For a linear relationship, we would be finding values for $\theta$ in the equation, $$\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2$$ such that the error between $y$ and $\hat{y}$ is minimised.

System of Linear Equations

We first need to set up a system of linear equations, which will take the form,

$$Ax = b$$

Writing out the matrices, they would be

$$\begin{bmatrix} 1 & 2 & 4 \\ 1 & 5 & 1 \\ 1 & 3 & 3 \\ 1 & 4 & 3 \\ 1 & 2 & 5 \\ \end{bmatrix} \begin{bmatrix} \theta_0 \\ \theta_1 \\ \theta_2 \\ \end{bmatrix} = \begin{bmatrix} 29 \\ 15 \\ 24 \\ 27 \\ 34 \\ \end{bmatrix}$$

Note that a column of 1's has been inserted at the start of matrix $A$ so we can get a value for the intercept, $\theta_0$.

If matrix $A$ was singular/invertible, we could solve the system by multiplying both sides by the inverse of $A$, ie

$$\begin{aligned} Ax &= b \\ A^{-1}Ax &= A^{-1}b \\ x &= A^{-1}b \end{aligned}$$

Unfortunately, not every matrix has an inverse. For starters, the matrix would need to be square - and in our example the a dimensions are 5x3. So we'll turn to what's known as the pseudo-inverse.

Moore-Penrose Inverse

The Moore-Penrose inverse, often referred to as the pseudo-inverse or generalised inverse, provides an approximation of the inverse of a matrix.

If we again take the system $Ax=b$ and now multiply both sides by $A^T$, we get

$$\begin{aligned} Ax &= b \\ A^T Ax &= A^{T} b \\ x &= (A^T A)^{-1} A^{T} b \\ \end{aligned}$$

The term $(A^T A)^{-1} A^T$ is the pseudo-inverse, which is denoted as $A^{\dagger}$, so we can write

$$x = A^{\dagger}b $$

We'll run through the example from above using Numpy to calculate this. First, we need to set up the matrices as Numpy arrays.

import numpy as np

A = np.array([
    [2,4],
    [5,1],
    [3,3],
    [4,3],
    [2,5],
])

# Add in the column of ones to A
A_ = np.append(np.ones((A.shape[0],1)), A, axis=1)

b = np.array([
    [29],
    [15],
    [24],
    [27],    
    [34],    
])

If we want to calculate the pseudo-inverse, $A^{\dagger}$, we need to perform $(A^T A)^{-1} A^T$. The @ operator can be used for matrix multiplication for Numpy arrays.

A_pinv = np.linalg.inv((A_.T) @ A_) @ A_.T
print('Pseudo-inverse of A:')
print(np.round(A_pinv,2))
Pseudo-inverse of A:
[[ 3.4   1.    2.6  -3.8  -2.2 ]
 [-0.6  -0.   -0.4   0.7   0.3 ]
 [-0.4  -0.25 -0.35  0.55  0.45]]

We can now try to find a solution for x, using

$$x = A^{\dagger} b$$
x = A_pinv @ b
print(x)
[[-1.4]
 [ 2.1]
 [ 6.4]]

We have found,

$$\begin{aligned} \theta_0 &= -1.4 \\ \theta_1 &= 2.1 \\ \theta_2 &= 6.4 \\ \end{aligned}$$

So our least squares solution is,

$$\hat{y} = -1.4 + 2.1x_1 + 6.4x_2$$

Numpy's Pseudo-Inverse Function

Numpy provides us with a function for computing the pseudoinverse, numpy.linalg.pinv(). It uses singular-value decomposition (SVD) which is more robust and will also make the code a lot cleaner.

x = np.linalg.pinv(A_) @ b
print(x)
[[-1.4]
 [ 2.1]
 [ 6.4]]

Again, we find the same values as before. We can also check these with a gradient descent algorithm.

from sklearn.linear_model import LinearRegression

reg = LinearRegression().fit(A,b)
print(f'Intercept: {reg.intercept_} \
      \nCoefficients: {reg.coef_}')
Intercept: [-1.4]       
Coefficients: [[2.1 6.4]]

Perfect, all our values match. We can put it in a function for later use.

def lss(A: np.ndarray, b: np.ndarray) -> np.ndarray:
    """
    Least squares solution for the linear system Ax=b. 
    Returns an array of the coefficients, the first value
    is the intercept/bias.
    """
    if not all(isinstance(obj, np.ndarray) for obj in [A,b]):
        return 'Arguments must be of type np.ndarray'
    if A.ndim == 1:
        A = A.reshape(-1,1)
    A_ = np.append(np.ones((A.shape[0],1)), A, axis=1)
    return np.linalg.pinv(A_) @ b