WRMSSE and the M5 Dataset

Kaggle's M5 forecasting competition uses a weighted adaptation of the RMSSE as its measure of accuracy. This guide will establish why each series is weighted, how the WRMSSE is calculated, and a Python package for calculating it locally - instead of having to upload a submission file to Kaggle.

Weighting the Error

Recall that the RMSSE allows us to compare the accuracy of series at different scales. The WRMSSE builds further on this by assigning each RMSSE value with a weight - but why?

Consider a car dealership that also sells hanging air fresheners over the counter. Now, let's say we have a forecasting model that can accurately predict the sale of air fresheners, but doesn't have much predictive power for the sale of cars. This model isn't going to be of much use to the business as the majority of their revenue is going to be coming from the sale of cars. To help tune the forecasting model, we can assign a weights to each series that reflect their importance. High value items can have large weights so that the error is amplified - a big red warning that we're on the wrong track.

As a brief overview, the M5 WRMSSE uses a weighted score that considers the cumalitive dollar amount for each product - the higher the sales revenue, the more important it becomes. Additionally, there are 12 levels with which the data is aggregated over. For example, one level looks at the total revenue for each store and assigns a weight accordingly (a business will be more interested in accurately predicting sales for a high revenue mega-store, than one of it's tiny corner shops).

It will become clearer as we work through this, but at any point you can refer to the M5 competitors guide if you need a more detailed explanation.

Calculate the RMSSE

The M5 dataset contains the sale of 30,490 products tracked over 1,941 days. We'll need to use the sales_train_evaluation.csv file and split it into a train/test set. Let's read the csv file to see what we're working with.

import pandas as pd
import numpy as np

sales = pd.read_csv('data/sales_train_evaluation.csv')
sales.head()

The first columns are categorical information; item identification and the state, store, department etc it belongs to. The remaining columns are sales for each day, beginning on day 1 (d_1) and finishing on day 1941 (d_1941).

We will split the daily sales into a train/test split. The forecast will be simple; use the final 28-days of sales from the train set (naive forecast).

h = 28     # forecasting horizon
train = sales.iloc[:,:-h]
test = sales.iloc[:,-h:]
forecast = train.iloc[:,-h:]    # 28-day naive forecast

As a quick sanity check, we can have a look at the dimensions of the train, test and forecast sets.

print('Training set: %s\nTest set: %s\nForecast: %s' %
       (train.filter(like='d_').shape, test.shape, forecast.shape))

Training set: (30490, 1913)
Test set: (30490, 28)
Forecast: (30490, 28)

As expected, we have a training set with 30,490 products with sales for 1913 days. The test and forecast sets are both 28 days.

We can now calculate the RMSSE. The rmsse() function below is taken from the RMSSE tutorial, which will return the RMSSE for each row.

def rmsse(train, test, forecast):
    forecast_mse = np.mean((test - forecast)**2, axis=1)
    train_mse = [np.mean((np.diff(np.trim_zeros(row))**2)) for row in train]
    return np.sqrt(forecast_mse / train_mse)

rmsse_vals = rmsse(train.filter(like="d_").values, test.values, forecast.values)
print(rmsse_vals)

[1.42857919 0.80052712 1.33151305 ... 1.07916462 1.31252951 0.98403747]

print('Number of values: %d' % len(rmsse_vals))
print('RMSSE mean: %.3f' % np.mean(rmsse_vals))

Number of values: 30490
RMSSE mean: 1.032

We have 30,490 RMSSE values, one for each product/row in the dataset. Out of interest, if we take the mean of these values we end up with a value of 1.032.

However, to calculate the WRMSSE as per the M5 competitors guide, we perform

$$WRMSSE = \sum_{i=1}^{42,840} w_i * RMSSE$$

From the formula, we need to apply a weight to each RMSSE value and sum over the 42,840 series... but we only have 30,490! So before we are able to calculate the WRMSSE, we first need to aggregate the dataset over 12 different levels.

The 12 Levels of Aggregation

The below table is taken from the M5 guide, which shows the 12 levels of aggregation we need to perform.

Level	Aggregation	Number of series
1	Unit sales of all products, aggregated for all stores/states	1
2	Unit sales of all products, aggregated for each State	3
3	Unit sales of all products, aggregated for each store	10
4	Unit sales of all products, aggregated for each category	3
5	Unit sales of all products, aggregated for each department	7
6	Unit sales of all products, aggregated for each State and category	9
7	Unit sales of all products, aggregated for each State and department	21
8	Unit sales of all products, aggregated for each store and category	30
9	Unit sales of all products, aggregated for each store and department	70
10	Unit sales of product x, aggregated for all stores/states	3,049
11	Unit sales of product x, aggregated for each State	9,147
12	Unit sales of product x, aggregated for each store	30,490
	Total	42,840

The 30,490 series need to be aggregated 12 different ways, based on categorical information to get a total of 42,840 series. For example, level 1 is the summation of all product sales across all stores, so we end up with one final series representing total sales. In level 2, we group the data by the state first. Since there are three states, we end up with three additional time series. And so on.

Let's look at the training set again.

train.head()

We have the categorical info required for the aggregation stored in the first few columns (item_id, dept_id, cat_id etc). We'll set this up in a dictionary where the keys are the level numbers and the values are how they are going to be grouped.

agg_levels = {
    'level_1': None,
    'level_2': ['state_id'],
    'level_3': ['store_id'],
    'level_4': ['cat_id'],
    'level_5': ['dept_id'],
    'level_6': ['state_id', 'cat_id'],
    'level_7': ['state_id', 'dept_id'],
    'level_8': ['store_id', 'cat_id'],
    'level_9': ['store_id', 'dept_id'],
    'level_10': ['item_id'],
    'level_11': ['item_id', 'state_id'],
    'level_12': ['id']
}

We need to iterate over each level, group by the category and then perform a summation. The first level is None, meaning we won't be grouping at all, so it will be a total aggregation.

def aggregation(df):
    df_agg = pd.DataFrame([])
    for v in agg_levels.values():
        if v is None:
            agg = df.sum(numeric_only=True)
            agg = pd.DataFrame([agg], index=['Total'])
        else:
            agg = df.groupby(by=v).sum()
        df_agg = pd.concat([df_agg, agg])
    return df_agg

The training set can simply be passed to the aggregation() function.

train_agg = aggregation(train)
train_agg.head()

The first row is the total summation (level 1), followed by three rows of where the data has been grouped by state (level 2). We can check that we have the 42,840 required rows.

train_agg.shape

(42840, 1913)

We can repeat this aggregation process on the test and forecast sets. First we need to insert the categorical columns and then pass them to the aggregation() function.

# Concatenate with categorical columns
test = pd.concat((sales.loc[:,:'state_id'],test), axis=1)
forecast = pd.concat((sales.loc[:,:'state_id'],forecast), axis=1)

# Perform aggregation over the 12 levels
test_agg = aggregation(test)
forecast_agg = aggregation(forecast)

# Calculate RMSSE with aggregated datasets
rmsse_vals_agg = rmsse(train_agg.values, 
                       test_agg.values, 
                       forecast_agg.values)
print(rmsse_vals_agg)
print('Number of values: %d' % len(rmsse_vals_agg))
print('RMSSE mean: %.3f' % np.mean(rmsse_vals_agg))

[0.62889454 0.54634122 0.69085288 ... 1.2904286  1.24211801 0.49919979]
Number of values: 42840
RMSSE mean: 1.042

Great, we have 42,840 RMSSE values now. Let's have a look at the WRMSSE formula again.

$$WRMSSE = \sum_{i=1}^{42,840} w_i * RMSSE$$

All we need to do is multiply each RMSSE by an appropriate weight, $w$.

Calculating the Weights

The weighting for each row is going to be based on the proportion of total revenue. Large weights will be assigned to high revenue products/series, as a business will be more concerned with accurately predicting those. Going back to the example of a car yard, a larger weight would be applied to the error in car sales. A low weight would be applied to the sales of air fresheners.

To be able to calculate the weights, we need to know the revenue. At the moment, we only have unit sales. Each unit sale will need to be multiplied by the corresponding sale price. We need the data stored in calendar.csv and sell_prices.csv. We'll also go back to the original 30,490 series in sales_train_evaluation.csv.

calendar = pd.read_csv('data/calendar.csv')
sales = pd.read_csv('data/sales_train_evaluation.csv')
prices = pd.read_csv('data/sell_prices.csv')

First, let's grab the unit sales for last month and assign it to sales_lm. The final 28 days in sales_train_evaluation.csv is the test set, so we need the 28 days prior to this period.

h = 28
sales_lm = sales.iloc[:,-2*h:-h].set_index(sales['id'])
sales_lm.head()

We have the sales figures for each day, now we need to find their corresponding price. We can get this information from the combining information stored in the prices and calendar files.

prices.head()

calendar.head()

We have to do a bit of a match up here. The sell price for each product is given by the week, wm_yr_wk. We can find which days correspond to which weeks in the calendar file. For example, day d_1 belongs to wm_yr_wk 11101.

We're only concerned with the days d_1886 to d_1913, which are the column names in sales_lm. So we'll take a slice of the calendar dataframe and assign it to wm_d.

wm_d = calendar[calendar['d'].isin(sales_lm.columns)]
wm_d[['wm_yr_wk','d']].head()

Likewise, we can take a slice of the prices dataframe so we're only working with the weeks that correspond to d_1886 to d_1913.

prices_lm = prices[prices['wm_yr_wk'].isin(wm_d['wm_yr_wk'].unique())]
prices_lm.head()

We can take the prices_lm dataframe, merge it wm_d to get the days, then pivot the table to make the days shift axis to columns and finally merge with the sales dataframe to the id index.

prices_lm = (prices_lm
             .merge(wm_d, how='left', on='wm_yr_wk')
             .pivot_table(index=['store_id','item_id'], 
                          columns='d',values='sell_price')
             .merge(sales[['id','item_id', 'store_id']], 
                    how='left', on=['item_id', 'store_id'])
             .set_index('id')
             .drop(['item_id','store_id'], axis=1)
             .reindex(index=sales_lm.index)       
            )
prices_lm.head()

Great, we now have the sell prices for each product for each day.

The next step is to do an element-wise multiplication of sales_lm and prices_lm to get the revenue on each day, followed by a row-wise summation to get the total revenue for each id.

revenue_lm = (sales_lm * prices_lm).sum(axis=1)
revenue_lm.head()

id
HOBBIES_1_001_CA_1_evaluation    224.94
HOBBIES_1_002_CA_1_evaluation      7.94
HOBBIES_1_003_CA_1_evaluation     47.52
HOBBIES_1_004_CA_1_evaluation    236.64
HOBBIES_1_005_CA_1_evaluation    109.44
dtype: float64

The weights are going to be the percentage of total revenue, so we take each row and divide it by the sum of all rows.

revenue_pct = revenue_lm / revenue_lm.sum()
revenue_pct.head()

id
HOBBIES_1_001_CA_1_evaluation    0.000060
HOBBIES_1_002_CA_1_evaluation    0.000002
HOBBIES_1_003_CA_1_evaluation    0.000013
HOBBIES_1_004_CA_1_evaluation    0.000063
HOBBIES_1_005_CA_1_evaluation    0.000029
dtype: float64

The last step will be to aggregate the weights over the 12 levels. To do this, we need to merge the revenue_pct dataframe with the categorical data (item_id, state_id, dept_id etc).

revenue_pct = (revenue_pct
               .to_frame(name='revenue_pct')
               .reset_index()
               .merge(sales.loc[:,:'state_id'], how='left', on='id')
              )
revenue_pct.head()

Now we can perform the aggregation. Each level is going to sum to 1, so we set the weight of the first level to be equal to 1. For the remaining levels, we group by the categorical data and do a summation.

weights = pd.Series([], dtype='float64')
for v in agg_levels.values():
    if v is None:
        w = pd.Series([1])
    else:
        w = revenue_pct.groupby(by=v).sum()['revenue_pct']
    weights = pd.concat([weights,w])
weights.head()

0       1.000000
CA      0.442371
TX      0.269297
WI      0.288332
CA_1    0.110888
dtype: float64

We've done 12 levels of aggregation with each level summing to a weight of 1. We need to divide the weights by 12.

weights = weights / 12
print('Number of weights: %s' % len(weights))
print('Sum of all weights: %.5f' % weights.sum())

Number of weights: 42840
Sum of all weights: 1.00000

Calculating WRMSSE

We now have 42,840 weights and we previously calculated 42,840 RMSSE values. Almost there!

$$WRMSSE = \sum_{i=1}^{42,840} w_i * RMSSE$$

All that's left to do is multiply the weights and RMSSE arrays and do a final sum.

wrmsse = (weights * rmsse_vals_agg).sum()
print('WRMSSE: %.3f' % wrmsse)

WRMSSE: 0.838

There it is, a final WRMSSE of 0.838 for the 28-day naive forecast.

Download and Install the m5-wrmsse Package

I've put together a package to make life easier, the GitHub project can be found here. You can also install it using pip:

pip install m5-wrmsse

Once installed, you can import the wrmsse function.

from m5_wrmsse import wrmsse

The wrmsse function takes a 28-day forecast, which will be a numpy array of shape (30490, 28). We can test this on a naive forecast using the sales_train_validation.csv file.

sales = pd.read_csv('data/sales_train_validation.csv')
naive_forecast = sales.iloc[:,-h:].values
print(naive_forecast.shape)

(30490, 28)

We can now pass the naive_forecast array to the wrmsse function to evaluate the score.

wrmsse(naive_forecast)

0.8377320359321682

Easy!

	d_1	d_2	d_3	d_4	d_5	d_6	d_7	d_8	d_9	d_10	...	d_1904	d_1905	d_1906	d_1907	d_1908	d_1909	d_1910	d_1911	d_1912	d_1913
Total	32631	31749	23783	25412	19146	29211	28010	37932	32736	25572	...	41789	48362	51640	38059	37570	35343	35033	40517	48962	49795
CA	14195	13805	10108	11047	9925	11322	12251	16610	14696	11822	...	16255	20564	23032	17052	15784	15148	14488	17095	21834	23187
TX	9438	9630	6778	7381	5912	9006	6226	9440	9376	7319	...	10800	12460	13709	9868	10922	9600	9602	10615	12266	12282
WI	8998	8314	6897	6984	3309	8883	9533	11882	8664	6431	...	14734	15338	14899	11139	10864	10595	10943	12807	14862	14326
CA_1	4337	4155	2816	3051	2630	3276	3450	5437	4340	3157	...	3982	5437	5954	4345	3793	3722	3709	4387	5577	6113

	d_1886	d_1887	d_1888	d_1889	d_1890	d_1891	d_1892	d_1893	d_1894	d_1895	...	d_1904	d_1905	d_1906	d_1907	d_1908	d_1909	d_1910	d_1911	d_1912	d_1913
id
HOBBIES_1_001_CA_1_evaluation	1	0	0	0	0	0	1	0	4	2	...	1	3	0	1	1	1	3	0	1	1
HOBBIES_1_002_CA_1_evaluation	1	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	1	0	0	0	0
HOBBIES_1_003_CA_1_evaluation	0	0	0	0	0	0	1	0	0	0	...	2	1	2	1	1	1	0	1	1	1
HOBBIES_1_004_CA_1_evaluation	0	0	0	0	3	1	2	1	3	1	...	1	0	5	4	1	0	1	3	7	2
HOBBIES_1_005_CA_1_evaluation	1	0	4	4	0	1	4	0	1	0	...	2	1	1	0	1	1	2	2	2	4

	date	wm_yr_wk	weekday	wday	month	year	d	event_name_1	event_type_1	event_name_2	event_type_2	snap_CA	snap_TX	snap_WI
0	2011-01-29	11101	Saturday	1	1	2011	d_1	NaN	NaN	NaN	NaN	0	0	0
1	2011-01-30	11101	Sunday	2	1	2011	d_2	NaN	NaN	NaN	NaN	0	0	0
2	2011-01-31	11101	Monday	3	1	2011	d_3	NaN	NaN	NaN	NaN	0	0	0
3	2011-02-01	11101	Tuesday	4	2	2011	d_4	NaN	NaN	NaN	NaN	1	1	0
4	2011-02-02	11101	Wednesday	5	2	2011	d_5	NaN	NaN	NaN	NaN	1	0	1

	wm_yr_wk	d
1885	11609	d_1886
1886	11609	d_1887
1887	11609	d_1888
1888	11609	d_1889
1889	11609	d_1890

	store_id	item_id	wm_yr_wk	sell_price
141	CA_1	HOBBIES_1_001	11609	8.26
142	CA_1	HOBBIES_1_001	11610	8.26
143	CA_1	HOBBIES_1_001	11611	8.38
144	CA_1	HOBBIES_1_001	11612	8.38
145	CA_1	HOBBIES_1_001	11613	8.38

	id	item_id	dept_id	cat_id	store_id	state_id	...	d_1932	d_1933	d_1934	d_1935	d_1936	d_1937	d_1938	d_1939	d_1940	d_1941
0	HOBBIES_1_001_CA_1_evaluation	HOBBIES_1_001	HOBBIES_1	HOBBIES	CA_1	CA	...	2	4	0	0	0	0	3	3	0	1
1	HOBBIES_1_002_CA_1_evaluation	HOBBIES_1_002	HOBBIES_1	HOBBIES	CA_1	CA	...	0	1	2	1	1	0	0	0	0	0
2	HOBBIES_1_003_CA_1_evaluation	HOBBIES_1_003	HOBBIES_1	HOBBIES	CA_1	CA	...	1	0	2	0	0	0	2	3	0	1
3	HOBBIES_1_004_CA_1_evaluation	HOBBIES_1_004	HOBBIES_1	HOBBIES	CA_1	CA	...	1	1	0	4	0	1	3	0	2	6
4	HOBBIES_1_005_CA_1_evaluation	HOBBIES_1_005	HOBBIES_1	HOBBIES	CA_1	CA	...	0	0	0	2	1	0	0	2	1	0

	store_id	item_id	wm_yr_wk	sell_price
0	CA_1	HOBBIES_1_001	11325	9.58
1	CA_1	HOBBIES_1_001	11326	9.58
2	CA_1	HOBBIES_1_001	11327	8.26
3	CA_1	HOBBIES_1_001	11328	8.26
4	CA_1	HOBBIES_1_001	11329	8.26

	id	revenue_pct	item_id	dept_id	cat_id	store_id	state_id
0	HOBBIES_1_001_CA_1_evaluation	0.000060	HOBBIES_1_001	HOBBIES_1	HOBBIES	CA_1	CA
1	HOBBIES_1_002_CA_1_evaluation	0.000002	HOBBIES_1_002	HOBBIES_1	HOBBIES	CA_1	CA
2	HOBBIES_1_003_CA_1_evaluation	0.000013	HOBBIES_1_003	HOBBIES_1	HOBBIES	CA_1	CA
3	HOBBIES_1_004_CA_1_evaluation	0.000063	HOBBIES_1_004	HOBBIES_1	HOBBIES	CA_1	CA
4	HOBBIES_1_005_CA_1_evaluation	0.000029	HOBBIES_1_005	HOBBIES_1	HOBBIES	CA_1	CA

	d_1886	d_1887	d_1888	d_1889	d_1890	d_1891	d_1892	d_1893	d_1894	d_1895	...	d_1904	d_1905	d_1906	d_1907	d_1908	d_1909	d_1910	d_1911	d_1912	d_1913
id
HOBBIES_1_001_CA_1_evaluation	8.26	8.26	8.26	8.26	8.26	8.26	8.26	8.26	8.26	8.26	...	8.38	8.38	8.38	8.38	8.38	8.38	8.38	8.38	8.38	8.38
HOBBIES_1_002_CA_1_evaluation	3.97	3.97	3.97	3.97	3.97	3.97	3.97	3.97	3.97	3.97	...	3.97	3.97	3.97	3.97	3.97	3.97	3.97	3.97	3.97	3.97
HOBBIES_1_003_CA_1_evaluation	2.97	2.97	2.97	2.97	2.97	2.97	2.97	2.97	2.97	2.97	...	2.97	2.97	2.97	2.97	2.97	2.97	2.97	2.97	2.97	2.97
HOBBIES_1_004_CA_1_evaluation	4.64	4.64	4.64	4.64	4.64	4.64	4.64	4.64	4.64	4.64	...	4.64	4.64	4.64	4.64	4.64	4.64	4.64	4.64	4.64	4.64
HOBBIES_1_005_CA_1_evaluation	2.88	2.88	2.88	2.88	2.88	2.88	2.88	2.88	2.88	2.88	...	2.88	2.88	2.88	2.88	2.88	2.88	2.88	2.88	2.88	2.88