SES and New Zealand GHG Emissions

In the first tutorial on Simple Exponential Smoothing, we explored two approaches to smoothing a time series. We'll now apply SES on a NZ Greenhouse Gas Emissions dataset, where we will see the effect of the smoothing paramater, alpha, on the resulting forecast. Later, we'll attempt to optimise the selection of alpha using a SciPy minimisation function.

Let's get started!

New Zealand Greenhouse Gas Emissions Data

The dataset we will look at is from Stats NZ. It's a csv file containing the greenhouse gas emissions for different sectors recorded between 2014-2022. You can download the data here.

import pandas as pd
import numpy as np

ghg_emissions = pd.read_csv('data/ghg-emissions.csv')
ghg_emissions

We're just going to extract a single time-series, so we'll slice the dataframe where Anzsic_description=='Primary industries' and Variable=='Seasonally adjusted'. We'll also convert the Period to a Pandas datetime type.

import matplotlib.pyplot as plt

prim_ind = ghg_emissions[(ghg_emissions['Anzsic_description'] == 'Primary industries') &
                     (ghg_emissions['Variable'] == 'Seasonally adjusted')]
prim_ind = prim_ind[['Period','Data_value']].copy()
prim_ind.rename({'Data_value': 'Actual'}, axis=1, inplace=True)
prim_ind['Period'] = pd.to_datetime(prim_ind['Period'])

# Plot Primary Industries emissions
fig, ax = plt.subplots(figsize=(12,4))
prim_ind.plot(x='Period', y='Actual', ax=ax)
plt.ylabel('CO2 Emissions (kilotonnes)')
plt.title('New Zealand Primary Industries GHG Emissions')
plt.show()

Simple Exponential Smoothing Function

Recall that the equation for SES is as follows:

$$F_t = \alpha A_{t-1} + (1 - \alpha)F_{t-1}$$

We'll re-use the function that takes in a time series and a value for alpha and returns an array for the smoothed series.

def ses(ts, alpha):
    """
    Perform simple exponential smoothing on an array and
    return the smoothed series
    
    Parameters
    ----------
    ts (N,) : array_like
        1-D time series
    alpha : float
        Smoothing factor, `0 < alpha < 1`
        
    Returns
    -------
    forecast (N+1,) : ndarray
        1-D forecast array
    """
    n = len(ts) + 1
    forecast = np.zeros(n)
    forecast[0] = ts[0]
    for i in range(1,n):
        forecast[i] = alpha*ts[i-1] + (1-alpha)*forecast[i-1]
    return forecast

We can trial a couple different values for alpha to see their influence on the smoothed series, let's start off with $\alpha = 0.1$ and $\alpha = 0.9$.

# Calculate forecasts for a low and high value of alpha
forecast_01 = ses(prim_ind['Actual'].values, alpha=0.1)[:-1]
forecast_09 = ses(prim_ind['Actual'].values, alpha=0.9)[:-1]

The resulting series can then be plotted.

# Plot results
period = prim_ind['Period'].values
fig, ax = plt.subplots(figsize=(12,4))
plt.plot(period, prim_ind['Actual'], label='Actual')
plt.plot(period, forecast_01, linestyle='--', 
         label='Forecast, alpha=0.1')
plt.plot(period, forecast_09, linestyle='--', 
         label='Forecast, alpha=0.9')
plt.ylabel('CO2 Emissions (kilotonnes)')
plt.title('New Zealand Emissions - Primary Industries')
plt.legend()
plt.show()

Note that for a low value of alpha, we have a "smoother" forecast. Setting a high value for alpha produces an almost identical series but shifted ahead by one step. If we were to set $\alpha = 1$, then we we would essentially be creating a naive forecast, because the second term in the SES formula becomes zero. $$\begin{aligned} \\ F_t &= \alpha*A_{t-1} + (1-\alpha)*F_{t-1} \\ &=1 * A_{t-1} + 0*F_{t-1} \\ &= A_{t-1} \\ \end{aligned}$$

So, this is all well and good, but how do decide on a value for alpha?

Choosing a Value for Alpha

One approach to setting alpha is to minimise the error between the observed series and the forecasted series. For this example, we'll calculate the Mean Squared Error (MSE) between the two series. So let's iterate through a number of values for alpha and plot the resulting MSE.

# Function for calculating the MSE of two numpy arrays
mse = lambda actual, forecast: ((actual - forecast)**2).mean()

# Iterate through different values of alpha, store the MSE in mse_errs
alphas = np.arange(0.3,1,step=0.05)
mse_errs = []
for a in alphas:
    forecast = ses(prim_ind['Actual'].values, alpha=a)
    err = mse(prim_ind['Actual'], forecast[:-1])
    mse_errs.append(err)

# Plot the errors
fig, ax = plt.subplots()
plt.plot(alphas,mse_errs)
plt.xlabel('Alpha')
plt.ylabel('Error (MSE)')
plt.title('Error as a Function of Alpha')
plt.show()

From the plot, we can see that the MSE is at its lowest somewhere around $\alpha \approx 0.7$. Each series is going to be different, there are no universal "best" value for alpha. This means that this process will have to be repeated for each new time series. So perhaps we can cut down on the time but finding the optimal value using the scipy.optimize.minimize function.

Optimising Alpha with scipy.optimize.minimize

The scipy.optimize.minimize function takes another function as its first parameter and attempts to minimise the output. In our case, we need to pass it a function which returns the MSE at a given value for alpha. We'll call this function ses_error, which will take alpha as its first parameter.

The scipy minimize function will check the MSE at different values of alpha, and then store the value of alpha which minimises the MSE.

from scipy.optimize import minimize

def ses_error(alpha, actual):
    """Return the MSE between the actual and smoothed series"""
    forecast = ses(actual, alpha)
    return mse(actual, forecast[:-1])

# Parameters required for minimize function
actual = prim_ind['Actual'].values
alpha_init = 0.7       # Initial value alpha is set to
bnds = [(0.5,0.9)]     # The bounds which minimize searches between
optimised = minimize(ses_error, alpha_init, args=actual, bounds=bnds)

print(f'Optimal Alpha: {optimised.x[0]:.3f}')

Optimal Alpha: 0.683

The result of 0.683 is very close to the 0.7 we determined visually from the graph before, so we can conclude the minimize function has done it's job!

Creating a Single Point Forecast

The last piece of the pie will be using this optimised value of alpha to create a forecast for the future. We'll go back and reuse the ses() function and pass in the optimised value of alpha, which is stored in optimised.x.

forecast = ses(actual, alpha=optimised.x[0])

The prediction for 1-step into the future will be the final value in the forecasted array.

print(f'Prediction: {forecast[-1]:.2f}')

Prediction: 10941.38

Our prediction is 10,941 kilotonnes of CO2 for the start of 2022. We can plot the optimised smoothed series along with the original.

periods = prim_ind['Period'].values
f_periods = np.append(prim_ind['Period'], np.datetime64('2022-01-01'))

# Plot results
fig, ax = plt.subplots(figsize=(12,4))
plt.plot(periods, actual, label='Actual')
plt.plot(f_periods, forecast, linestyle='--', 
         label='Forecast, alpha={:.3f}'.format(a))
plt.ylabel('CO2 Emissions (kilotonnes)')
plt.title('New Zealand Emissions - Primary Industries')
plt.legend()
plt.show()

While SES allows us to forecast one step into the future, this is as far as its predictive powers can go. If we were to forecast into 2023, or 2024, the prediction is going to be the same - 10,941 kilotonnes. So we'll now move on to Double Exponential Smoothing, also known as Holt's method. This method allows us to track an underlying trend component so we can forecast further into the future.

See you there!

	Anzsic	Anzsic_description	Period	Data_value	Variable	Units	Gas	Status
0	ZPZ	Primary industries	2014Q1	11316	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents	P
1	ZPZ	Primary industries	2014Q2	11179	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents	P
2	ZPZ	Primary industries	2014Q3	11126	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents	P
3	ZPZ	Primary industries	2014Q4	11118	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents	P
4	ZPZ	Primary industries	2015Q1	11064	Seasonally adjusted	Kilotonnes	Carbon dioxide equivalents	P
...	...	...	...	...	...	...	...	...
1019	ZZZ	Total	2020Q4	21307	Actual	Kilotonnes	Carbon dioxide equivalents	P
1020	ZZZ	Total	2021Q1	22198	Actual	Kilotonnes	Carbon dioxide equivalents	P
1021	ZZZ	Total	2021Q2	19013	Actual	Kilotonnes	Carbon dioxide equivalents	P
1022	ZZZ	Total	2021Q3	17781	Actual	Kilotonnes	Carbon dioxide equivalents	P
1023	ZZZ	Total	2021Q4	20307	Actual	Kilotonnes	Carbon dioxide equivalents	P