Duplicate detection in web shops using LSH to reduce the number of computations

Van Dam, Iris; Nijenhuis, Nikki; Van Ginkel, Gerhard; Vandic, Damir; Kuipers, Wim; Frasincar, Flavius

doi:10.1145/2851613.2851861

Van Dam, I. (Iris), Nijenhuis, N. (Nikki), Van Ginkel, G. (Gerhard), D. Vandic (Damir), Kuipers, W. (Wim) and F. Frasincar (Flavius)

2016-04-04

Duplicate detection in web shops using LSH to reduce the number of computations

Presented at the 31st Annual ACM Symposium on Applied Computing, SAC 2016 (April 2016), Pisa

The amount of online shops is growing daily and many Web shops focus on the same product types, like consumer electronics. Since Web shops use different product representations, it is hard to compare products among different Web shops. Duplicate detection methods aim to solve this problem by identifying the same products in different Web shops. In this paper, we focus on reducing the computation time of a state-of-the-art duplicate detection algorithm. First, we construct uniform vector representations for the products. We use these vectors as input for a Locality Sensitive Hashing (LSH) algorithm, which pre-selects potential duplicates. Finally, duplicate products are found by applying the Multi-component Similarity Method (MSM). Compared to original MSM, the number of needed computations can be reduced by 95% with only a minor decrease by 9% in the F1-measure.

Additional Metadata
Keywords	Duplicate detection, Locality-sensitive hashing, Web shop products
Persistent URL	doi.org/10.1145/2851613.2851861, hdl.handle.net/1765/97341
Conference	31st Annual ACM Symposium on Applied Computing, SAC 2016
Organisation	Erasmus University Rotterdam
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Van Dam, I. (Iris), Nijenhuis, N. (Nikki), Van Ginkel, G. (Gerhard), Vandic, D., Kuipers, W. (Wim), & Frasincar, F. (2016). Duplicate detection in web shops using LSH to reduce the number of computations. In Proceedings of the ACM Symposium on Applied Computing (pp. 772–779). doi:10.1145/2851613.2851861

Duplicate detection in web shops using LSH to reduce the number of computations

Publication

Publication

About

Duplicate detection in web shops using LSH to reduce the number of computations

Publication

Publication

Workflow

Workflow

Add Content