Traditionally, equity investors have relied upon the information reported in firms’ financial accounts to make their investment decisions. Due to the conservative nature of accounting standards, firms cannot value their intangible assets such as corporate culture, brand value and reputation. Investors’ efforts to collect such information have been hampered by the voluntary nature of Corporate Social Responsibility (CSR) reporting standards, which have resulted in the publication of inconsistent, stale and incomplete information across firms. In short, information on intangible assets is less salient to investors compared to accounting information because it is more costly to collect, process and analyse.

In this thesis we design an automated approach to collect and quantify information on firms’ intangible assets by drawing upon techniques commonly adopted in the fields of Natural Language Processing (NLP) and Information Retrieval. The exploitation of unstructured data available on the Web holds promise for investors seeking to integrate a wider variety of information into their investment processes. The objectives of this research are: 1) to draw upon textual analysis methodologies to measure intangible information from a range of unstructured data sources, 2) to integrate intangible information and accounting information into an investment analysis framework, 3) evaluate the merits of unstructured data for the prediction of firms’ future earnings.