The main conclusions of the M3 competition were derived from the analyses of descriptive statistics with no formal statistical testing. One of the commentaries noted that the results had not been tested for statistical significance. This paper undertakes such an analysis by examining the primary findings of that competition. We introduce a new methodology that has not previously been used to evaluate economic forecasts: multiple comparisons. We use this technique to compare each method against the best and against the mean. We conclude that the accuracy of the various methods does differ significantly, and that some methods are significantly better than others. We confirm that there is no relationship between complexity and accuracy but also show that there is a significant relationship among the various measures of accuracy. Finally, we find that the M3 conclusion that a combination of methods is better than that of the methods being combined was not proven.

Additional Metadata
Keywords analysis of ranks, competitions, forecasting, multiple comparisons
Persistent URL dx.doi.org/10.1016/j.ijforecast.2004.10.003, hdl.handle.net/1765/13752
Journal International Journal of Forecasting
Citation
Koning, A.J, Franses, Ph.H.B.F, Hibon, M, & Stekler, H.O. (2005). The M3 Competition: Statistical Tests of Result. International Journal of Forecasting, 21(3), 397–409. doi:10.1016/j.ijforecast.2004.10.003