In this paper we build on previous work related to predicting the MSCI EURO index based on content analysis of ECB statements. Our focus is on reducing the number of features employed for prediction through feature selection. For this purpose we rely on two methodologies: (stepwise) linear regression and greedy forward feature subset selection. The original dataset consists of 13 features (General Inquirer content categories). Both methodologies provide an improvement in the overall accuracy of the model, while reducing the number of features employed. Through linear regression we achieve an accuracy of 67.58% on the testing set by relying on six features, while greedy forward selection enables an accuracy on the test set of 69.50% while relying on eight features.

doi.org/10.1109/CIFER.2011.5953571, hdl.handle.net/1765/31235
Symposium Series on Computational Intelligence, IEEE SSCI 2011 - 2011 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics, CIFEr 2011
Erasmus School of Economics

Milea, V., Almeida e Santos Nogueira, R. J., Kaymak, U., & Frasincar, F. (2011). A fuzzy model of a European index based on automatically extracted content information. Presented at the Symposium Series on Computational Intelligence, IEEE SSCI 2011 - 2011 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics, CIFEr 2011. doi:10.1109/CIFER.2011.5953571