We investigate how to determine the most representative image on a Web page. This problem has not been thoroughly investigated and, up to today, only expert-based algorithms have been proposed in the literature. We attempt to improve the performance of known algorithms with the use of Support Vector Machines (SVM). Besides, our algorithm distinguishes itself from existing literature with the introduction of novel image features, including previously unused meta-data protocols. Also, we design and attempt a less-restrictive ranking methodology in the image preprocessing stage of our algorithm. We find that the application of the SVM framework with our improved classification methodology increases the F1 score from 27.2% to 38.5%, as compared to a state-of-the-art method. Introducing novel image features and applying backward feature selection, we find that the F1 score rises to 40.0%. Lastly, we use a class-weighted SVM in order to resolve the imbalance in number of representative images. This final modification improves the classification performance of our algorithm even further to 43.9%, outperforming our benchmark algorithms, including those of Facebook and Google. Suggested beneficiaries are the search engine community, image retrieval community, including the commercial sector due to superior performance.

Additional Metadata
Keywords Feature selection, Image search, Representative image, Support vector machines
Persistent URL dx.doi.org/10.1016/j.ins.2019.10.045, hdl.handle.net/1765/120988
Journal Information Sciences
Citation
Vyas, K. (Krishna), & Frasincar, F. (2019). Determining the most representative image on a Web page. Information Sciences. doi:10.1016/j.ins.2019.10.045