WRMSSE and the M5 Dataset

Kaggle's M5 forecasting competition uses a weighted adaptation of the RMSSE as its measure of accuracy. This guide will establish why each series is weighted, how the WRMSSE is calculated, and a Python package for calculating it locally - instead of having to upload a submission file to Kaggle.

Weighting the Error

Recall that the RMSSE allows us to compare the accuracy of series at different scales. The WRMSSE builds further on this by assigning each RMSSE value with a weight - but why?

Consider a car dealership that also sells hanging air fresheners over the counter. Now, let's say we have a forecasting model that can accurately predict the sale of air fresheners, but doesn't have much predictive power for the sale of cars. This model isn't going to be of much use to the business as the majority of their revenue is going to be coming from the sale of cars. To help tune the forecasting model, we can assign a weights to each series that reflect their importance. High value items can have large weights so that the error is amplified - a big red warning that we're on the wrong track.

As a brief overview, the M5 WRMSSE uses a weighted score that considers the cumalitive dollar amount for each product - the higher the sales revenue, the more important it becomes. Additionally, there are 12 levels with which the data is aggregated over. For example, one level looks at the total revenue for each store and assigns a weight accordingly (a business will be more interested in accurately predicting sales for a high revenue mega-store, than one of it's tiny corner shops).

It will become clearer as we work through this, but at any point you can refer to the M5 competitors guide if you need a more detailed explanation.

Calculate the RMSSE

The M5 dataset contains the sale of 30,490 products tracked over 1,941 days. We'll need to use the sales_train_evaluation.csv file and split it into a train/test set. Let's read the csv file to see what we're working with.

import pandas as pd
import numpy as np

sales = pd.read_csv('data/sales_train_evaluation.csv')
sales.head()
id item_id dept_id cat_id store_id state_id d_1 d_2 d_3 d_4 ... d_1932 d_1933 d_1934 d_1935 d_1936 d_1937 d_1938 d_1939 d_1940 d_1941
0 HOBBIES_1_001_CA_1_evaluation HOBBIES_1_001 HOBBIES_1 HOBBIES CA_1 CA 0 0 0 0 ... 2 4 0 0 0 0 3 3 0 1
1 HOBBIES_1_002_CA_1_evaluation HOBBIES_1_002 HOBBIES_1 HOBBIES CA_1 CA 0 0 0 0 ... 0 1 2 1 1 0 0 0 0 0
2 HOBBIES_1_003_CA_1_evaluation HOBBIES_1_003 HOBBIES_1 HOBBIES CA_1 CA 0 0 0 0 ... 1 0 2 0 0 0 2 3 0 1
3 HOBBIES_1_004_CA_1_evaluation HOBBIES_1_004 HOBBIES_1 HOBBIES CA_1 CA 0 0 0 0 ... 1 1 0 4 0 1 3 0 2 6
4 HOBBIES_1_005_CA_1_evaluation HOBBIES_1_005 HOBBIES_1 HOBBIES CA_1 CA 0 0 0 0 ... 0 0 0 2 1 0 0 2 1 0

5 rows × 1947 columns

The first columns are categorical information; item identification and the state, store, department etc it belongs to. The remaining columns are sales for each day, beginning on day 1 (d_1) and finishing on day 1941 (d_1941).

We will split the daily sales into a train/test split. The forecast will be simple; use the final 28-days of sales from the train set (naive forecast).

h = 28     # forecasting horizon
train = sales.iloc[:,:-h]
test = sales.iloc[:,-h:]
forecast = train.iloc[:,-h:]    # 28-day naive forecast

As a quick sanity check, we can have a look at the dimensions of the train, test and forecast sets.

print('Training set: %s\nTest set: %s\nForecast: %s' %
       (train.filter(like='d_').shape, test.shape, forecast.shape))
Training set: (30490, 1913)
Test set: (30490, 28)
Forecast: (30490, 28)

As expected, we have a training set with 30,490 products with sales for 1913 days. The test and forecast sets are both 28 days.

We can now calculate the RMSSE. The rmsse() function below is taken from the RMSSE tutorial, which will return the RMSSE for each row.

def rmsse(train, test, forecast):
    forecast_mse = np.mean((test - forecast)**2, axis=1)
    train_mse = [np.mean((np.diff(np.trim_zeros(row))**2)) for row in train]
    return np.sqrt(forecast_mse / train_mse)

rmsse_vals = rmsse(train.filter(like="d_").values, test.values, forecast.values)
print(rmsse_vals)
[1.42857919 0.80052712 1.33151305 ... 1.07916462 1.31252951 0.98403747]
print('Number of values: %d' % len(rmsse_vals))
print('RMSSE mean: %.3f' % np.mean(rmsse_vals))
Number of values: 30490
RMSSE mean: 1.032

We have 30,490 RMSSE values, one for each product/row in the dataset. Out of interest, if we take the mean of these values we end up with a value of 1.032.

However, to calculate the WRMSSE as per the M5 competitors guide, we perform

$$WRMSSE = \sum_{i=1}^{42,840} w_i * RMSSE$$

From the formula, we need to apply a weight to each RMSSE value and sum over the 42,840 series... but we only have 30,490! So before we are able to calculate the WRMSSE, we first need to aggregate the dataset over 12 different levels.

The 12 Levels of Aggregation

The below table is taken from the M5 guide, which shows the 12 levels of aggregation we need to perform.

Level Aggregation Number of series
1 Unit sales of all products, aggregated for all stores/states 1
2 Unit sales of all products, aggregated for each State 3
3 Unit sales of all products, aggregated for each store 10
4 Unit sales of all products, aggregated for each category 3
5 Unit sales of all products, aggregated for each department 7
6 Unit sales of all products, aggregated for each State and category 9
7 Unit sales of all products, aggregated for each State and department 21
8 Unit sales of all products, aggregated for each store and category 30
9 Unit sales of all products, aggregated for each store and department 70
10 Unit sales of product x, aggregated for all stores/states 3,049
11 Unit sales of product x, aggregated for each State 9,147
12 Unit sales of product x, aggregated for each store 30,490
Total 42,840

The 30,490 series need to be aggregated 12 different ways, based on categorical information to get a total of 42,840 series. For example, level 1 is the summation of all product sales across all stores, so we end up with one final series representing total sales. In level 2, we group the data by the state first. Since there are three states, we end up with three additional time series. And so on.

Let's look at the training set again.

train.head()
id item_id dept_id cat_id store_id state_id d_1 d_2 d_3 d_4 ... d_1904 d_1905 d_1906 d_1907 d_1908 d_1909 d_1910 d_1911 d_1912 d_1913
0 HOBBIES_1_001_CA_1_evaluation HOBBIES_1_001 HOBBIES_1 HOBBIES CA_1 CA 0 0 0 0 ... 1 3 0 1 1 1 3 0 1 1
1 HOBBIES_1_002_CA_1_evaluation HOBBIES_1_002 HOBBIES_1 HOBBIES CA_1 CA 0 0 0 0 ... 0 0 0 0 0 1 0 0 0 0
2 HOBBIES_1_003_CA_1_evaluation HOBBIES_1_003 HOBBIES_1 HOBBIES CA_1 CA 0 0 0 0 ... 2 1 2 1 1 1 0 1 1 1
3 HOBBIES_1_004_CA_1_evaluation HOBBIES_1_004 HOBBIES_1 HOBBIES CA_1 CA 0 0 0 0 ... 1 0 5 4 1 0 1 3 7 2
4 HOBBIES_1_005_CA_1_evaluation HOBBIES_1_005 HOBBIES_1 HOBBIES CA_1 CA 0 0 0 0 ... 2 1 1 0 1 1 2 2 2 4

5 rows × 1919 columns

We have the categorical info required for the aggregation stored in the first few columns (item_id, dept_id, cat_id etc). We'll set this up in a dictionary where the keys are the level numbers and the values are how they are going to be grouped.

agg_levels = {
    'level_1': None,
    'level_2': ['state_id'],
    'level_3': ['store_id'],
    'level_4': ['cat_id'],
    'level_5': ['dept_id'],
    'level_6': ['state_id', 'cat_id'],
    'level_7': ['state_id', 'dept_id'],
    'level_8': ['store_id', 'cat_id'],
    'level_9': ['store_id', 'dept_id'],
    'level_10': ['item_id'],
    'level_11': ['item_id', 'state_id'],
    'level_12': ['id']
}

We need to iterate over each level, group by the category and then perform a summation. The first level is None, meaning we won't be grouping at all, so it will be a total aggregation.

def aggregation(df):
    df_agg = pd.DataFrame([])
    for v in agg_levels.values():
        if v is None:
            agg = df.sum(numeric_only=True)
            agg = pd.DataFrame([agg], index=['Total'])
        else:
            agg = df.groupby(by=v).sum()
        df_agg = pd.concat([df_agg, agg])
    return df_agg

The training set can simply be passed to the aggregation() function.

train_agg = aggregation(train)
train_agg.head()
d_1 d_2 d_3 d_4 d_5 d_6 d_7 d_8 d_9 d_10 ... d_1904 d_1905 d_1906 d_1907 d_1908 d_1909 d_1910 d_1911 d_1912 d_1913
Total 32631 31749 23783 25412 19146 29211 28010 37932 32736 25572 ... 41789 48362 51640 38059 37570 35343 35033 40517 48962 49795
CA 14195 13805 10108 11047 9925 11322 12251 16610 14696 11822 ... 16255 20564 23032 17052 15784 15148 14488 17095 21834 23187
TX 9438 9630 6778 7381 5912 9006 6226 9440 9376 7319 ... 10800 12460 13709 9868 10922 9600 9602 10615 12266 12282
WI 8998 8314 6897 6984 3309 8883 9533 11882 8664 6431 ... 14734 15338 14899 11139 10864 10595 10943 12807 14862 14326
CA_1 4337 4155 2816 3051 2630 3276 3450 5437 4340 3157 ... 3982 5437 5954 4345 3793 3722 3709 4387 5577 6113

5 rows × 1913 columns

The first row is the total summation (level 1), followed by three rows of where the data has been grouped by state (level 2). We can check that we have the 42,840 required rows.

train_agg.shape
(42840, 1913)

We can repeat this aggregation process on the test and forecast sets. First we need to insert the categorical columns and then pass them to the aggregation() function.

# Concatenate with categorical columns
test = pd.concat((sales.loc[:,:'state_id'],test), axis=1)
forecast = pd.concat((sales.loc[:,:'state_id'],forecast), axis=1)

# Perform aggregation over the 12 levels
test_agg = aggregation(test)
forecast_agg = aggregation(forecast)

# Calculate RMSSE with aggregated datasets
rmsse_vals_agg = rmsse(train_agg.values, 
                       test_agg.values, 
                       forecast_agg.values)
print(rmsse_vals_agg)
print('Number of values: %d' % len(rmsse_vals_agg))
print('RMSSE mean: %.3f' % np.mean(rmsse_vals_agg))
[0.62889454 0.54634122 0.69085288 ... 1.2904286  1.24211801 0.49919979]
Number of values: 42840
RMSSE mean: 1.042

Great, we have 42,840 RMSSE values now. Let's have a look at the WRMSSE formula again.

$$WRMSSE = \sum_{i=1}^{42,840} w_i * RMSSE$$

All we need to do is multiply each RMSSE by an appropriate weight, $w$.

Calculating the Weights

The weighting for each row is going to be based on the proportion of total revenue. Large weights will be assigned to high revenue products/series, as a business will be more concerned with accurately predicting those. Going back to the example of a car yard, a larger weight would be applied to the error in car sales. A low weight would be applied to the sales of air fresheners.

To be able to calculate the weights, we need to know the revenue. At the moment, we only have unit sales. Each unit sale will need to be multiplied by the corresponding sale price. We need the data stored in calendar.csv and sell_prices.csv. We'll also go back to the original 30,490 series in sales_train_evaluation.csv.

calendar = pd.read_csv('data/calendar.csv')
sales = pd.read_csv('data/sales_train_evaluation.csv')
prices = pd.read_csv('data/sell_prices.csv')

First, let's grab the unit sales for last month and assign it to sales_lm. The final 28 days in sales_train_evaluation.csv is the test set, so we need the 28 days prior to this period.

h = 28
sales_lm = sales.iloc[:,-2*h:-h].set_index(sales['id'])
sales_lm.head()
d_1886 d_1887 d_1888 d_1889 d_1890 d_1891 d_1892 d_1893 d_1894 d_1895 ... d_1904 d_1905 d_1906 d_1907 d_1908 d_1909 d_1910 d_1911 d_1912 d_1913
id
HOBBIES_1_001_CA_1_evaluation 1 0 0 0 0 0 1 0 4 2 ... 1 3 0 1 1 1 3 0 1 1
HOBBIES_1_002_CA_1_evaluation 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 1 0 0 0 0
HOBBIES_1_003_CA_1_evaluation 0 0 0 0 0 0 1 0 0 0 ... 2 1 2 1 1 1 0 1 1 1
HOBBIES_1_004_CA_1_evaluation 0 0 0 0 3 1 2 1 3 1 ... 1 0 5 4 1 0 1 3 7 2
HOBBIES_1_005_CA_1_evaluation 1 0 4 4 0 1 4 0 1 0 ... 2 1 1 0 1 1 2 2 2 4

5 rows × 28 columns

We have the sales figures for each day, now we need to find their corresponding price. We can get this information from the combining information stored in the prices and calendar files.

prices.head()
store_id item_id wm_yr_wk sell_price
0 CA_1 HOBBIES_1_001 11325 9.58
1 CA_1 HOBBIES_1_001 11326 9.58
2 CA_1 HOBBIES_1_001 11327 8.26
3 CA_1 HOBBIES_1_001 11328 8.26
4 CA_1 HOBBIES_1_001 11329 8.26
calendar.head()
date wm_yr_wk weekday wday month year d event_name_1 event_type_1 event_name_2 event_type_2 snap_CA snap_TX snap_WI
0 2011-01-29 11101 Saturday 1 1 2011 d_1 NaN NaN NaN NaN 0 0 0
1 2011-01-30 11101 Sunday 2 1 2011 d_2 NaN NaN NaN NaN 0 0 0
2 2011-01-31 11101 Monday 3 1 2011 d_3 NaN NaN NaN NaN 0 0 0
3 2011-02-01 11101 Tuesday 4 2 2011 d_4 NaN NaN NaN NaN 1 1 0
4 2011-02-02 11101 Wednesday 5 2 2011 d_5 NaN NaN NaN NaN 1 0 1

We have to do a bit of a match up here. The sell price for each product is given by the week, wm_yr_wk. We can find which days correspond to which weeks in the calendar file. For example, day d_1 belongs to wm_yr_wk 11101.

We're only concerned with the days d_1886 to d_1913, which are the column names in sales_lm. So we'll take a slice of the calendar dataframe and assign it to wm_d.

wm_d = calendar[calendar['d'].isin(sales_lm.columns)]
wm_d[['wm_yr_wk','d']].head()
wm_yr_wk d
1885 11609 d_1886
1886 11609 d_1887
1887 11609 d_1888
1888 11609 d_1889
1889 11609 d_1890

Likewise, we can take a slice of the prices dataframe so we're only working with the weeks that correspond to d_1886 to d_1913.

prices_lm = prices[prices['wm_yr_wk'].isin(wm_d['wm_yr_wk'].unique())]
prices_lm.head()
store_id item_id wm_yr_wk sell_price
141 CA_1 HOBBIES_1_001 11609 8.26
142 CA_1 HOBBIES_1_001 11610 8.26
143 CA_1 HOBBIES_1_001 11611 8.38
144 CA_1 HOBBIES_1_001 11612 8.38
145 CA_1 HOBBIES_1_001 11613 8.38

We can take the prices_lm dataframe, merge it wm_d to get the days, then pivot the table to make the days shift axis to columns and finally merge with the sales dataframe to the id index.

prices_lm = (prices_lm
             .merge(wm_d, how='left', on='wm_yr_wk')
             .pivot_table(index=['store_id','item_id'], 
                          columns='d',values='sell_price')
             .merge(sales[['id','item_id', 'store_id']], 
                    how='left', on=['item_id', 'store_id'])
             .set_index('id')
             .drop(['item_id','store_id'], axis=1)
             .reindex(index=sales_lm.index)       
            )
prices_lm.head()
d_1886 d_1887 d_1888 d_1889 d_1890 d_1891 d_1892 d_1893 d_1894 d_1895 ... d_1904 d_1905 d_1906 d_1907 d_1908 d_1909 d_1910 d_1911 d_1912 d_1913
id
HOBBIES_1_001_CA_1_evaluation 8.26 8.26 8.26 8.26 8.26 8.26 8.26 8.26 8.26 8.26 ... 8.38 8.38 8.38 8.38 8.38 8.38 8.38 8.38 8.38 8.38
HOBBIES_1_002_CA_1_evaluation 3.97 3.97 3.97 3.97 3.97 3.97 3.97 3.97 3.97 3.97 ... 3.97 3.97 3.97 3.97 3.97 3.97 3.97 3.97 3.97 3.97
HOBBIES_1_003_CA_1_evaluation 2.97 2.97 2.97 2.97 2.97 2.97 2.97 2.97 2.97 2.97 ... 2.97 2.97 2.97 2.97 2.97 2.97 2.97 2.97 2.97 2.97
HOBBIES_1_004_CA_1_evaluation 4.64 4.64 4.64 4.64 4.64 4.64 4.64 4.64 4.64 4.64 ... 4.64 4.64 4.64 4.64 4.64 4.64 4.64 4.64 4.64 4.64
HOBBIES_1_005_CA_1_evaluation 2.88 2.88 2.88 2.88 2.88 2.88 2.88 2.88 2.88 2.88 ... 2.88 2.88 2.88 2.88 2.88 2.88 2.88 2.88 2.88 2.88

5 rows × 28 columns

Great, we now have the sell prices for each product for each day.

The next step is to do an element-wise multiplication of sales_lm and prices_lm to get the revenue on each day, followed by a row-wise summation to get the total revenue for each id.

revenue_lm = (sales_lm * prices_lm).sum(axis=1)
revenue_lm.head()
id
HOBBIES_1_001_CA_1_evaluation    224.94
HOBBIES_1_002_CA_1_evaluation      7.94
HOBBIES_1_003_CA_1_evaluation     47.52
HOBBIES_1_004_CA_1_evaluation    236.64
HOBBIES_1_005_CA_1_evaluation    109.44
dtype: float64

The weights are going to be the percentage of total revenue, so we take each row and divide it by the sum of all rows.

revenue_pct = revenue_lm / revenue_lm.sum()
revenue_pct.head()
id
HOBBIES_1_001_CA_1_evaluation    0.000060
HOBBIES_1_002_CA_1_evaluation    0.000002
HOBBIES_1_003_CA_1_evaluation    0.000013
HOBBIES_1_004_CA_1_evaluation    0.000063
HOBBIES_1_005_CA_1_evaluation    0.000029
dtype: float64

The last step will be to aggregate the weights over the 12 levels. To do this, we need to merge the revenue_pct dataframe with the categorical data (item_id, state_id, dept_id etc).

revenue_pct = (revenue_pct
               .to_frame(name='revenue_pct')
               .reset_index()
               .merge(sales.loc[:,:'state_id'], how='left', on='id')
              )
revenue_pct.head()
id revenue_pct item_id dept_id cat_id store_id state_id
0 HOBBIES_1_001_CA_1_evaluation 0.000060 HOBBIES_1_001 HOBBIES_1 HOBBIES CA_1 CA
1 HOBBIES_1_002_CA_1_evaluation 0.000002 HOBBIES_1_002 HOBBIES_1 HOBBIES CA_1 CA
2 HOBBIES_1_003_CA_1_evaluation 0.000013 HOBBIES_1_003 HOBBIES_1 HOBBIES CA_1 CA
3 HOBBIES_1_004_CA_1_evaluation 0.000063 HOBBIES_1_004 HOBBIES_1 HOBBIES CA_1 CA
4 HOBBIES_1_005_CA_1_evaluation 0.000029 HOBBIES_1_005 HOBBIES_1 HOBBIES CA_1 CA

Now we can perform the aggregation. Each level is going to sum to 1, so we set the weight of the first level to be equal to 1. For the remaining levels, we group by the categorical data and do a summation.

weights = pd.Series([], dtype='float64')
for v in agg_levels.values():
    if v is None:
        w = pd.Series([1])
    else:
        w = revenue_pct.groupby(by=v).sum()['revenue_pct']
    weights = pd.concat([weights,w])
weights.head()
0       1.000000
CA      0.442371
TX      0.269297
WI      0.288332
CA_1    0.110888
dtype: float64

We've done 12 levels of aggregation with each level summing to a weight of 1. We need to divide the weights by 12.

weights = weights / 12
print('Number of weights: %s' % len(weights))
print('Sum of all weights: %.5f' % weights.sum())
Number of weights: 42840
Sum of all weights: 1.00000

Calculating WRMSSE

We now have 42,840 weights and we previously calculated 42,840 RMSSE values. Almost there!

$$WRMSSE = \sum_{i=1}^{42,840} w_i * RMSSE$$

All that's left to do is multiply the weights and RMSSE arrays and do a final sum.

wrmsse = (weights * rmsse_vals_agg).sum()
print('WRMSSE: %.3f' % wrmsse)
WRMSSE: 0.838

There it is, a final WRMSSE of 0.838 for the 28-day naive forecast.

Download and Install the m5-wrmsse Package

I've put together a package to make life easier, the GitHub project can be found here. You can also install it using pip:

pip install m5-wrmsse

Once installed, you can import the wrmsse function.

from m5_wrmsse import wrmsse

The wrmsse function takes a 28-day forecast, which will be a numpy array of shape (30490, 28). We can test this on a naive forecast using the sales_train_validation.csv file.

sales = pd.read_csv('data/sales_train_validation.csv')
naive_forecast = sales.iloc[:,-h:].values
print(naive_forecast.shape)
(30490, 28)

We can now pass the naive_forecast array to the wrmsse function to evaluate the score.

wrmsse(naive_forecast)
0.8377320359321682

Easy!