Combining SKU-level sales forecasts from models and experts

doi:10.1016/j.eswa.2010.08.024

Expert Systems with Applications

Volume 38, Issue 3, March 2011, Pages 2365-2370

https://doi.org/10.1016/j.eswa.2010.08.024 Get rights and content

Abstract

We study the performance of SKU-level sales forecasts which linearly combine statistical model forecasts and expert forecasts. Using a large and unique database containing model forecasts for monthly sales of various pharmaceutical products and forecasts given by about 50 experts, we document that a linear combination of those forecasts usually is most accurate. Correlating the weights of the expert forecasts in these linear combinations with the experts’ experience and behaviour shows that more experience and modest deviation from model forecasts gives most weight of the expert forecast. When the rate of bracketing increases, we notice a convergence to equal weights. We show that these results are robust across 12 different forecast horizons.

Introduction

There is abundant literature on the relative performance of model forecasts, expert forecasts and their combination, see Lawrence et al., 2006, Fildes et al., 2009, and the earlier work of Blattberg and Hoch (1990). The most common findings are that expert forecasts can improve on model forecasts and that a linear combination of the model forecast with an expert forecast is often even better. The literature so far mainly considers a few single-product, single-horizon and single-expert cases.

In our present paper we aim to extend the currently available literature by considering various products in various product categories, 12 different forecast horizons and about 50 experts. A main additional feature of our analysis is that we know a few characteristics of these experts and we also observe their behaviour. This allows us to correlate the optimal balance between the model and the experts with their characteristics and their behaviour, which in turn gives guidelines from a managerial perspective, and this is new to the literature.

In this paper we empirically analyze a unique and very large database with model forecasts, expert forecasts and realizations concerning monthly SKU-level sales of a range of pharmaceutical products for a large Netherlands-based firm. At the headquarters office, the model forecasts are automatically created by a statistical package, where the program each month allows for a re-specification of the model and it also re-estimates all parameters each time. The experts, located in local offices in 37 countries, receive these forecasts and, after that, create their forecasts using their own expertise. We will see that expert forecasts often differ from the model forecasts, which is perhaps not unexpected given the fact that the automatic program includes as input only lagged monthly sales values, and that this fact is known to the experts, see Goodwin, 2000, Goodwin, 2002.

The question the firm faces is whether the model forecasts and the expert forecasts can be improved by taking a linear combination of the two. A related question is whether this linear combination should follow an unconditional 50–50% rule, or whether the weights shall depend on the characteristics of the experts.

The literature on combining forecasts in for example Clemen, 1989, Timmermann, 2006 suggests that linear combinations of forecasts may improve on each of its contributors. So the first question we consider in this paper is whether there are optimal weights for each of the experts. And, if so, is that robust across forecast horizons and does it differ across experts?

The second question that we try to answer is whether these optimal weights can be explained by characteristics of the experts. This question is very relevant from a managerial perspective as it facilitates training of experts and also their selection prior to their appointment. Blattberg and Hoch (1990) claim that a 50–50% rule would be best but this claim corresponds with unconditional weights as it is not correlated with experts’ characteristics. Lamont (2002) demonstrated that age (experience) has a positive effect on the quality of an expert, but also that this effect is parabolic. There are also studies like Barber and Odean, 2001, Beyer and Bowden, 1997 which find gender differences in (over-)confidence levels, so perhaps there are also such differences across the relative weights of the experts in the combined forecasts. Finally, the degree of bracketing shall be important for the quality of the combined forecast. Larrick and Soll (2006, p. 112) state that when the rate of bracketing increases, the power of averaging forecasts does too. Their findings were based on experiments, and in the present study we shall seek empirical evidence for this statement based on factual data.

The outline of our paper is as follows. In Section 2 we outline the main features of our unique database. Section 3 deals with the methodology and gives the details of our empirical findings. Section 4 concludes with various implications for managers who need to evaluate the qualities of the experts.

Section snippets

Data

Our data concern a firm that creates model forecasts and which has almost 50 experts allocated in 37 countries¹ are allowed to report their own forecasts additional to the model forecasts they receive from the headquarter’s office. Average characteristics of these experts are available. The question the firm has is whether specific combinations of these two sets of forecasts are better

Methodology and results

To address the managerial questions of the firm, which are typical questions any firm would have to manage a range of experts, we aim to compute the optimal value of the weights in a combined forecast. This combined forecast for each expert i given a horizon h is given by $a_{i} {MF}_{i, j, t + h | t} + (1 - a_{i}) {EF}_{i, j, t + h | t}$ where we compute the value of a_i across all products within an expert-horizon combination. To achieve this aim, we compute the root mean squared prediction error (RMSPE) as $\sqrt{\frac{1}{J_{i}} \sum_{j = 1}^{J_{i}} [a_{i} {MF}_{i, j, t + h | t} + (}$

Discussion

Our paper analyzed a very large and unique database with model forecasts and expert forecasts to see if combining these forecasts would be beneficial. Blattberg and Hoch (1990) predicted that unconditional weights of 50–50% would be best. One of the novelties of our study is that we examined if these weights could be predicted by experts’ characteristics and actual behaviour or performance, that is, whether there are perhaps conditional weights.

References (11)

R.T. Clemen
Combining forecasts: A review and annotated bibliography (with discussion)
International Journal of Forecasting
(1989)
R. Fildes et al.
Effective forecasting and judgmental adjustments: An empirical evaluation and strategies for improvement in supply-chain planning
International Journal of Forecasting
(2009)
P. Goodwin
Improving the voluntary integration of statistical forecasts and judgement
International Journal of Forecasting
(2000)
P. Goodwin
Integrating management judgement with statistical methods to improve short-term forecasts
Omega
(2002)
O.A. Lamont
Macroeconomic forecasts and microeconomic forecasters
Journal of Economic Behavior & Organization
(2002)

There are more references available in the full text version of this article.

Cited by (35)

Demand forecasting for fashion products: A systematic review
2024, International Journal of Forecasting
Fashion is one of the most challenging categories for forecasting demand. Our study provides a systematic literature review of the different forecasting techniques used in the fashion industry. Particular focus is given to advancements in artificial intelligence and machine learning methods for predicting the demand for fashion products. Carefully compiled literature is analyzed, and the papers are classified into qualitative, statistical, artificial intelligence (AI), and hybrid techniques based on the forecasting method adopted by researchers. Our review identifies the challenges in predicting demand, and concludes by providing future research directions.
Use of contextual and model-based information in adjusting promotional forecasts
2023, European Journal of Operational Research
Despite improvements in statistical forecasting, human judgment remains fundamental to business forecasting and demand planning. Typically, forecasters do not rely solely on statistical forecasts; they also adjust forecasts according to their knowledge, experience, and information that is not available to statistical models. However, we have limited understanding of the adjustment mechanisms employed, particularly how people use additional information (e.g., special events and promotions, weather, holidays) and under which conditions this is beneficial. Using a multi-method approach, we first analyse a UK retailer case study exploring its operations and the forecasting process. The case study provides a contextual setting for the laboratory experiments that simulate a typical supply chain forecasting process. In the experimental study, we provide past sales, statistical forecasts (using baseline and promotional models) and qualitative information about past and future promotional periods. We include contextual information, with and without predictive value, that allows us to investigate whether forecasters can filter such information correctly. We find that when adjusting, forecasters tend to focus on model-based anchors, such as the last promotional uplift and the current statistical forecast, ignoring past baseline promotional values and additional information about previous promotions. The impact of contextual statements for the forecasting period depends on the type of statistical predictions provided: when a promotional forecasting model is presented, people tend to misinterpret the provided information and over-adjust, harming accuracy.
A machine learning approach for forecasting hierarchical time series
2021, Expert Systems with Applications
In this paper, we propose a machine learning approach for forecasting hierarchical time series. When dealing with hierarchical time series, apart from generating accurate forecasts, one needs to select a suitable method for producing reconciled forecasts. Forecast reconciliation is the process of adjusting forecasts to make them coherent across the hierarchy. In literature, coherence is often enforced by using a post-processing technique on the base forecasts produced by suitable time series forecasting methods. On the contrary, our idea is to use a deep neural network to directly produce accurate and reconciled forecasts. We exploit the ability of a deep neural network to extract information capturing the structure of the hierarchy. We impose the reconciliation at training time by minimizing a customized loss function. In many practical applications, besides time series data, hierarchical time series include explanatory variables that are beneficial for increasing the forecasting accuracy. Exploiting this further information, our approach links the relationship between time series features extracted at any level of the hierarchy and the explanatory variables into an end-to-end neural network providing accurate and reconciled point forecasts. The effectiveness of the approach is validated on three real-world datasets, where our method outperforms state-of-the-art competitors in hierarchical forecasting.
Hierarchical time series forecasting via Support Vector Regression in the European Travel Retail Industry
2019, Expert Systems with Applications
Times series often offers a natural disaggregation in a hierarchical structure. For example, product sales can come from different cities, districts, or states; or be grouped by categories and subcategories. This hierarchical structure can be useful for improving the forecast, and this strategy is known as hierarchical time series (HTS) analysis. In this work, a novel strategy for sales forecasting is proposed using Support Vector Regression (SVR) and hierarchical time series. We formalize three different hierarchical time series approaches: bottom-up SVR, top-down SVR, and middle-out SVR, and use them in a sales forecasting project for the Travel Retail Industry. Various hierarchical structures are proposed for the retail industry in order to achieve accurate product-level predictions. Experiments on these datasets demonstrate the virtues of SVR-based hierarchical time series in terms of predictive performance when compared with the traditional ARIMA and Holt-Winters approaches for this task.
Brain imaging and forecasting: Insights from judgmental model selection
2019, Omega (United Kingdom)
In this article, we shed light on the differences between two judgmental forecasting approaches for model selection – forecast selection and pattern identification – with regard to their forecasting performance and underlying cognitive processes. We designed a laboratory experiment using real-life time series as stimuli to record subjects’ selections as well as their brain activity by means of electroencephalography (EEG). We found that their cognitive load, measured by the amplitude of parietal P300, can be effectively used as a neurological indicator of identification and forecast accuracy. As a result, judgmental forecasting based on pattern identification outperforms forecast selection. Time series with low trendiness and high noisiness have low forecasting accuracy because of the high cognitive load induced.
Judgmental forecast adjustments over different time horizons
2019, Omega (United Kingdom)
Accurate demand forecasting is the cornerstone of a firm’s operations. The statistical system forecasts are often judgmentally adjusted by forecasters who believe their knowledge can improve the final forecasts. While empirical research on judgmental forecast adjustments has been increasing, an important aspect is under-studied: the impact of these adjustments over different time horizons. Collecting data from 8 business cases, retrieving over 307,200 forecast adjustments, this work assesses how the characteristics (e.g., size and direction) and accuracy of consecutive adjustments change over different time horizons. We find that closer to the sales point, the number of adjustments increases and adjustments become larger and more positive; and that adjustments, both close and distant from the sales point, can deteriorate the final forecast accuracy. We discuss how these insights impact operational activities, such as production planning.

View all citing articles on Scopus

View full text

Combining SKU-level sales forecasts from models and experts

Abstract

Introduction

Section snippets

Data

Methodology and results

Discussion

International Journal of Forecasting

International Journal of Forecasting

International Journal of Forecasting

Omega

Journal of Economic Behavior & Organization