A fuzzy model of a European index based on automatically extracted content information
In this paper we build on previous work related to predicting the MSCI EURO index based on content analysis of ECB statements. Our focus is on reducing the number of features employed for prediction through feature selection. For this purpose we rely on two methodologies: (stepwise) linear regression and greedy forward feature subset selection. The original dataset consists of 13 features (General Inquirer content categories). Both methodologies provide an improvement in the overall accuracy of the model, while reducing the number of features employed. Through linear regression we achieve an accuracy of 67.58% on the testing set by relying on six features, while greedy forward selection enables an accuracy on the test set of 69.50% while relying on eight features.