How to normalize cooccurrence data? An analysis of some well-known similarity measures

van Eck, Nees Jan; Waltman, Ludo

doi:10.1002/asi.21075

N.J.P. van Eck (Nees Jan) and L. Waltman (Ludo)

2009-08-01

How to normalize cooccurrence data? An analysis of some well-known similarity measures

American Society for Information Science and Technology. Journal , Volume 60 - Issue 8 p. 1635- 1651

In scientometric research, the use of cooccurrence data is very common. In many cases, a similarity measure is employed to normalize the data. However, there is no consensus among researchers on which similarity measure is most appropriate for normalization purposes. In this article, we theoretically analyze the properties of similarity measures for cooccurrence data, focusing in particular on four well-known measures: the association strength, the cosine, the inclusion index, and the Jaccard index. We also study the behavior of these measures empirically. Our analysis reveals that there exist two fundamentally different types of similarity measures, namely, set-theoretic measures and probabilistic measures. The association strength is a probabilistic measure, while the cosine, the inclusion index, and the Jaccard index are set-theoretic measures. Both our theoretical and our empirical results indicate that cooccurrence data can best be normalized using a probabilistic measure. This provides strong support for the use of the association strength in scientometric research.

Additional Metadata
Keywords	Jaccard index, association strength, cosine, inclusion index, similarity measure
Persistent URL	doi.org/10.1002/asi.21075, hdl.handle.net/1765/18647
Series	Econometric Institute Reprint Series
Journal	American Society for Information Science and Technology. Journal
Organisation	Erasmus Research Institute of Management
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	van Eck, N. J., & Waltman, L. (2009). How to normalize cooccurrence data? An analysis of some well-known similarity measures. American Society for Information Science and Technology. Journal, 60(8), 1635–1651. doi:10.1002/asi.21075

Full Text ( Final Version )

How to normalize cooccurrence data? An analysis of some well-known similarity measures

Publication

Publication

About

How to normalize cooccurrence data? An analysis of some well-known similarity measures

Publication

Publication

Workflow

Workflow

Add Content