In the first tutorial on Simple Exponential Smoothing, we explored two approaches to smoothing a time series. We'll now apply SES on a NZ Greenhouse Gas Emissions dataset, where we will see the effect of the smoothing paramater, alpha, on the resulting forecast. Later, we'll attempt to optimise the selection of alpha using a SciPy minimisation function.
Let's get started!
The dataset we will look at is from Stats NZ. It's a csv file containing the greenhouse gas emissions for different sectors recorded between 2014-2022. You can download the data here.
import pandas as pd
import numpy as np
ghg_emissions = pd.read_csv('data/ghg-emissions.csv')
ghg_emissions
We're just going to extract a single time-series, so we'll slice the dataframe where Anzsic_description=='Primary industries' and Variable=='Seasonally adjusted'. We'll also convert the Period to a Pandas datetime type.
import matplotlib.pyplot as plt
prim_ind = ghg_emissions[(ghg_emissions['Anzsic_description'] == 'Primary industries') &
(ghg_emissions['Variable'] == 'Seasonally adjusted')]
prim_ind = prim_ind[['Period','Data_value']].copy()
prim_ind.rename({'Data_value': 'Actual'}, axis=1, inplace=True)
prim_ind['Period'] = pd.to_datetime(prim_ind['Period'])
# Plot Primary Industries emissions
fig, ax = plt.subplots(figsize=(12,4))
prim_ind.plot(x='Period', y='Actual', ax=ax)
plt.ylabel('CO2 Emissions (kilotonnes)')
plt.title('New Zealand Primary Industries GHG Emissions')
plt.show()
Recall that the equation for SES is as follows:
$$F_t = \alpha A_{t-1} + (1 - \alpha)F_{t-1}$$We'll re-use the function that takes in a time series and a value for alpha and returns an array for the smoothed series.
def ses(ts, alpha):
"""
Perform simple exponential smoothing on an array and
return the smoothed series
Parameters
----------
ts (N,) : array_like
1-D time series
alpha : float
Smoothing factor, `0 < alpha < 1`
Returns
-------
forecast (N+1,) : ndarray
1-D forecast array
"""
n = len(ts) + 1
forecast = np.zeros(n)
forecast[0] = ts[0]
for i in range(1,n):
forecast[i] = alpha*ts[i-1] + (1-alpha)*forecast[i-1]
return forecast
We can trial a couple different values for alpha to see their influence on the smoothed series, let's start off with $\alpha = 0.1$ and $\alpha = 0.9$.
# Calculate forecasts for a low and high value of alpha
forecast_01 = ses(prim_ind['Actual'].values, alpha=0.1)[:-1]
forecast_09 = ses(prim_ind['Actual'].values, alpha=0.9)[:-1]
The resulting series can then be plotted.
# Plot results
period = prim_ind['Period'].values
fig, ax = plt.subplots(figsize=(12,4))
plt.plot(period, prim_ind['Actual'], label='Actual')
plt.plot(period, forecast_01, linestyle='--',
label='Forecast, alpha=0.1')
plt.plot(period, forecast_09, linestyle='--',
label='Forecast, alpha=0.9')
plt.ylabel('CO2 Emissions (kilotonnes)')
plt.title('New Zealand Emissions - Primary Industries')
plt.legend()
plt.show()
Note that for a low value of alpha, we have a "smoother" forecast. Setting a high value for alpha produces an almost identical series but shifted ahead by one step. If we were to set $\alpha = 1$, then we we would essentially be creating a naive forecast, because the second term in the SES formula becomes zero. $$\begin{aligned} \\ F_t &= \alpha*A_{t-1} + (1-\alpha)*F_{t-1} \\ &=1 * A_{t-1} + 0*F_{t-1} \\ &= A_{t-1} \\ \end{aligned}$$
So, this is all well and good, but how do decide on a value for alpha?
One approach to setting alpha is to minimise the error between the observed series and the forecasted series. For this example, we'll calculate the Mean Squared Error (MSE) between the two series. So let's iterate through a number of values for alpha and plot the resulting MSE.
# Function for calculating the MSE of two numpy arrays
mse = lambda actual, forecast: ((actual - forecast)**2).mean()
# Iterate through different values of alpha, store the MSE in mse_errs
alphas = np.arange(0.3,1,step=0.05)
mse_errs = []
for a in alphas:
forecast = ses(prim_ind['Actual'].values, alpha=a)
err = mse(prim_ind['Actual'], forecast[:-1])
mse_errs.append(err)
# Plot the errors
fig, ax = plt.subplots()
plt.plot(alphas,mse_errs)
plt.xlabel('Alpha')
plt.ylabel('Error (MSE)')
plt.title('Error as a Function of Alpha')
plt.show()
From the plot, we can see that the MSE is at its lowest somewhere around $\alpha \approx 0.7$. Each series is going to be different, there are no universal "best" value for alpha. This means that this process will have to be repeated for each new time series. So perhaps we can cut down on the time but finding the optimal value using the scipy.optimize.minimize function.
The scipy.optimize.minimize function takes another function as its first parameter and attempts to minimise the output. In our case, we need to pass it a function which returns the MSE at a given value for alpha. We'll call this function ses_error, which will take alpha as its first parameter.
The scipy minimize function will check the MSE at different values of alpha, and then store the value of alpha which minimises the MSE.
from scipy.optimize import minimize
def ses_error(alpha, actual):
"""Return the MSE between the actual and smoothed series"""
forecast = ses(actual, alpha)
return mse(actual, forecast[:-1])
# Parameters required for minimize function
actual = prim_ind['Actual'].values
alpha_init = 0.7 # Initial value alpha is set to
bnds = [(0.5,0.9)] # The bounds which minimize searches between
optimised = minimize(ses_error, alpha_init, args=actual, bounds=bnds)
print(f'Optimal Alpha: {optimised.x[0]:.3f}')
The result of 0.683 is very close to the 0.7 we determined visually from the graph before, so we can conclude the minimize function has done it's job!
The last piece of the pie will be using this optimised value of alpha to create a forecast for the future. We'll go back and reuse the ses() function and pass in the optimised value of alpha, which is stored in optimised.x.
forecast = ses(actual, alpha=optimised.x[0])
The prediction for 1-step into the future will be the final value in the forecasted array.
print(f'Prediction: {forecast[-1]:.2f}')
Our prediction is 10,941 kilotonnes of CO2 for the start of 2022. We can plot the optimised smoothed series along with the original.
periods = prim_ind['Period'].values
f_periods = np.append(prim_ind['Period'], np.datetime64('2022-01-01'))
# Plot results
fig, ax = plt.subplots(figsize=(12,4))
plt.plot(periods, actual, label='Actual')
plt.plot(f_periods, forecast, linestyle='--',
label='Forecast, alpha={:.3f}'.format(a))
plt.ylabel('CO2 Emissions (kilotonnes)')
plt.title('New Zealand Emissions - Primary Industries')
plt.legend()
plt.show()
While SES allows us to forecast one step into the future, this is as far as its predictive powers can go. If we were to forecast into 2023, or 2024, the prediction is going to be the same - 10,941 kilotonnes. So we'll now move on to Double Exponential Smoothing, also known as Holt's method. This method allows us to track an underlying trend component so we can forecast further into the future.
See you there!