2013-03-01
Sparse least trimmed squares regression for analyzing high-dimensional large data sets
Publication
Publication
Annals of Applied Statistics , Volume 7 - Issue 1 p. 226- 248
Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an L1penalty on the coefficient estimates to the well-known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. In addition, the sparse LTS is applied to protein and gene expression data of the NCI-60 cancer cell panel. Both a simulation study and the real data application show that the sparse LTS has better prediction performance than its competitors in the presence of leverage points.
Additional Metadata | |
---|---|
, , , , | |
doi.org/10.1214/12-AOAS575, hdl.handle.net/1765/39984 | |
ERIM Top-Core Articles | |
Annals of Applied Statistics | |
Organisation | Erasmus Research Institute of Management |
Alfons, A., Croux, C., & Gelper, S. (2013). Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Annals of Applied Statistics, 7(1), 226–248. doi:10.1214/12-AOAS575 |