Building domain taxonomies is a crucial task in the domain of ontology construction. Domain taxonomy learning keeps getting more important as a form of automatically obtaining a knowledge representation of a certain domain. The alternative of manually developing domain taxonomies is not trivial. The main issues encountered when manually developing a taxonomy are the non-availability of a domain knowledge expert and the considerable amount of effort needed for this task. This paper proposes Taxo Learn, an approach to automatic construction of domain taxonomies. Taxo Learn is a new methodology that combines aspects from existing approaches, but also contains new steps in order to improve the quality of the resulted domain taxonomy. The contribution of this paper is threefold. First, we employ a word sense disambiguation step when detecting concepts in the text. Second, we show the use of semantics-based hierarchical clustering for the purpose of taxonomy learning. Third, we propose a novel dynamic labeling procedure for the concept clusters. We evaluate our approach by comparing the machine generated taxonomy with a manually constructed golden taxonomy. Based on a corpus of documents in the field of financial economics, Taxo Learn shows a high precision for the learned taxonomic concept relationships.

, ,
doi.org/10.1109/WI-IAT.2012.129, hdl.handle.net/1765/40597
Erasmus School of Economics

Dietz, E.-A., Vandic, D., & Frasincar, F. (2012). TaxoLearn: A semantic approach to domain taxonomy learning. doi:10.1109/WI-IAT.2012.129