A support system for predicting eBay end prices

https://doi.org/10.1016/j.dss.2007.11.004Get rights and content

Abstract

We create a support system for predicting end prices on eBay. The end price predictions are based on the item descriptions found in the item listings of eBay, and on some numerical item features. The system uses text mining and boosting algorithms from the field of machine learning. Our system substantially outperforms the naive method of predicting the category mean price. Moreover, interpretation of the model enables us to identify influential terms in the item descriptions and shows that the item description is more influential than the seller feedback rating, which was shown to be influential in earlier studies.

Introduction

Online auctions are hot. The world's largest online auction site eBay reports in its first quarter financial report over 2006 a net revenue of $1.39 billion, realizing a growth rate of 35% in consecutive years [10]. For researchers with an interest in data mining, online auctions offer the opportunity to collect and mine large data sets at low costs.

The market price of a product is generally non-stationary at eBay — it fluctuates over time. It is even possible that identical items receive different bids at any given point in time. Merchants might buy items at eBay and try to re-sell these items with a profit. The success of these merchants depends on their ability to find bargains and, of course, their bidding strategy. Finding these bargains can be made easier by using a support system.

A recent paper [15] introduces the ‘Auction Advisor’ system, which simplifies the search for bargains by presenting its user with relevant information like the current bid and a recommended price based on recently closed auctions. Using this standardized presentation, the user is able to make bidding decisions within a short amount of time. In this paper we improve on this recommended price by making a price prediction based on several relevant characteristics of the auction: the number of pictures, the feedback rating, and the description of the item.

A substantial amount of research has been carried out on the analysis of historical auctions using data mining and statistical techniques. (An interesting review of this work from an economics perspective is given in [5].) Most of this work focuses on finding factors determining the auction end price. Reference [9] for example, tries to find such factors using a data set of ancient coin sales at eBay and finds that the number of participants, the use of reserve prices, and seller reputations are determinants of the end price. Others aim at characteristic behavior like last moment bidding [26]. For only a few studies the prediction of auction prices is the central problem. Several data mining methods are compared in [14] in order to find the most suitable method for price prediction, while [30] constructs a dynamic forecasting model, which can update the predicted price of an ongoing auction based on newly arrived information.

Unlike previous studies, we incorporate the textual information contained in the item description in our system when predicting the auction end price. To this end, our system downloads data on a large number of closed auctions from the eBay site. This data is then pre-processed and fed to a price–prediction model. Section 2 below discusses the data collection system and the pre-processing steps. The price–prediction model makes use of a vector space representation of the descriptions of the items [27]. Each position in the vector represents the occurrence — count of a specific word in an item description. This representation, known as the bag-of-words representation, is often used in Information Retrieval Systems. In these systems, the distance between the bag-of-word vectors of strings is used to find similar strings or documents.

Instead of using similarity calculations, our price–prediction model is based on boosting [13]. Boosting creates an ensemble of models that collectively make a prediction, in our case for the end price of the auction. We use decision trees as the individual models that form the ensemble, as is often done. Decision trees select important input dimensions in the course of their calibration process. This is a desirable property in text mining, as the number of input dimensions is usually very high. Section 3 discusses the models that we use in our system in more detail.

We test our system in two experiments described in Section 4. The paper ends with conclusions and a discussion in Section 5.

Section snippets

Data

There are several ways to collect auction data from eBay. Examples include using eBay's API (Application Programmers Interface), a web crawler, and buying a data set. Our system uses a web crawler that downloads the HTML source code of an auction page given its auction ID. (The crawler was written in Java.) The downloaded auction pages are the main HTML pages from eBay, they do not include the seller's feedback pages nor do they include the bidding history. Fig. 1 shows an example of (part of)

Methodology used for price prediction

This section discusses the machine learning techniques used in our system. We denote the available data by D = {(xi;yi)}i = 1N. An instance (observation, row) (x;y) consists of a vector of J attribute values x = (x1,…,xJ) and a target value y. The J attributes are the explanatory or independent variables, in our case the term counts and other input features discussed in Section 2. The target is the explained or dependent variable, in our case the auction end price.

Experiments and results

We experimented with our price prediction system using the data sets mentioned in Section 2. The data sets were randomly partitioned into a training set (80%) and a test set (20%). We repeated such splits 3 times for each data set, and built separate models on each training set.

The low number of repetitions, 3, is caused by the computational requirements for each experiment: Each run requires several hours of CPU time. The exact amount of time required varies with the number of auctions in the

Summary, conclusions & discussion

In this article we present a decision support system for predicting prices for online auctions. The predictions are based on a boosting model, which uses closed auctions of some product to predict prices for current auctions of the same product. The system uses the seller's feedback rating, the number of pictures on the web page and the seller's description of the item. The contribution of this study is twofold: it is the first study that uses the item description and number of pictures in the

Acknowledgments

We thank the anonymous referees and Nees Jan van Eck for their helpful suggestions.

Dennis van Heijst has been a masteral student of Informatics and Economics at Erasmus University. He currently works as an IT-auditor at Ernst and Young.

References (31)

  • L. Breiman et al.

    Classification and Regression Trees

    (1996)
  • D. Bryan et al.

    Pennies from eBay: The Determinants of Price in Online Auctions. Working Papers 0003, Department of Economics

    (November 1999)
  • eBay- Investor Relations. Web-site, 2006. http://investor.ebay.com/. Accessed on...
  • Y. Freund et al.

    Experiments with a new Boosting Algorithm

  • J.H. Friedman

    Greedy function approximation: a gradient boosting machine

    Annals of Statistics

    (2001)
  • Cited by (53)

    • A parameter optimization method in predicting algorithms for smart living

      2022, Computer Communications
      Citation Excerpt :

      In numerical prediction problems, prediction accuracy is one of the most critical metrics. In the experiment of this paper, three calibration metrics are used to measure the predicted effect, which are the criteria used to measure forecast accuracy are the mean absolute error (MAE), root mean-square error (RMSE), and mean absolute percentage error (MAPE) [32,33]. According to the aforementioned calibration metrics, the MAPE is an evaluation index commonly used in numerical prediction.

    • Predicting customer demand for remanufactured products: A data-mining approach

      2020, European Journal of Operational Research
      Citation Excerpt :

      The effect of such positive keywords on customer WTP has been previously studied, with mixed findings. For example, van Heijst, Potharst and van Wezel (2008) find that positivity of product description is the most influential predictor of customer WTP to remanufactured products, compared to the number of product pictures and seller feedback ratings. In contrast, Frota Neto et al. (2016) find no significant statistical evidence for such a relationship.

    View all citing articles on Scopus

    1. Download : Download full-size image

    Dennis van Heijst has been a masteral student of Informatics and Economics at Erasmus University. He currently works as an IT-auditor at Ernst and Young.

    1. Download : Download full-size image

    Rob Potharst received his MSc degree in Statistics and Operations Research from the University of Amsterdam. He earned a PhD at Erasmus University Rotterdam with a thesis on decision trees and neural networks. While teaching at the Econometric Institute of the latter university, he specializes in Computational Intelligence techniques in Marketing. Recently, he has published papers in the Intelligent Decision Analysis journal and in Decision Support Systems.

    1. Download : Download full-size image

    Michiel van Wezel works as an assistant professor at the Econometric Institute of the Erasmus School of Economics. His areas of interest are data mining and e-commerce. Michiel received a PhD in computer science from Leiden University and a MSc in computer science from Utrecht University.

    View full text