The search for models which link tomato taste attributes to their metabolic profiling, is a main challenge within the breeding programs that aim to enhance tomato flavor. In this paper, we compared such models calculated by the traditional statistical approach, stepwise regression, with models obtained by the new generation of regression techniques, known as penalized regression or regularization methods. In addition, for penalized regression, different scenarios and various model selection criteria were discussed to conclude that classical crossvalidation, selects models with many superfluous variables whereas model selection criteria such as Bayesian information criterion, seem to be more suitable, when the goal is to find parsimonious models, to explain tomato taste attributes based on metabolic information. An exhaustive comparison of the discussed methodology was done for six sensory traits, showing that the most important covariates were identified by the stepwise regression as well as by some of the penalized regression methods, despite the general disagreement on the size of the regression coefficients between them. In particular, for stepwise regression the coefficients are inflated due to their high variance which is not the case with penalized regression, showing that this new methodology, can be an alternative to obtain more accurate models.

Additional Metadata
Keywords Metabolites, Penalized regression, Phenotype prediction, Stepwise regression, Tomato taste attributes, Variable selection
Persistent URL dx.doi.org/10.1007/s10681-011-0374-5, hdl.handle.net/1765/31540
Citation
Menéndez, P, Eilers, P.H.C, Tikunov, Y, Bovy, A, & van Eeuwijk, F. (2011). Penalized regression techniques for modeling relationships between metabolites and tomato taste attributes. Euphytica, 183(3), 379–387. doi:10.1007/s10681-011-0374-5