2020
Scalable entity resolution for Web product descriptions
Publication
Publication
Information Fusion , Volume 53 p. 103- 111
Consumers are increasingly using the Web to find product information and make online purchases. This is reflected by the ongoing growth of worldwide e-commerce sales figures. Entity resolution is an important task that supports many services that have arisen from this growth, such as Web shop aggregators. In this paper, we propose a scalable framework for multi-source entity resolution. Our blocking approach employs model words to produce blocks that make our solution highly effective and efficient for the considered domains. An in-depth evaluation, performed using millions of experiments and three large datasets (on consumer electronics and software products), shows that our model words-based approach outperforms other approaches in most cases. Furthermore, we also evaluate our approach with an imperfect similarity function and find that model words-based blocking schemes provide the best blocks with respect to the F1-measure.
Additional Metadata | |
---|---|
, , , | |
doi.org/10.1016/j.inffus.2019.06.002, hdl.handle.net/1765/117264 | |
Information Fusion | |
Organisation | Erasmus University Rotterdam |
Vandic, D., Frasincar, F., Kaymak, U., & Riezebos, M. (2020). Scalable entity resolution for Web product descriptions. Information Fusion, 53, 103–111. doi:10.1016/j.inffus.2019.06.002 |