Background: Many techniques are proposed for the quantification of tumor heterogeneity as an imaging biomarker for differentiation between tumor types, tumor grading, response monitoring and outcome prediction. However, in clinical practice these methods are barely used. This study evaluates the reported performance of the described methods and identifies barriers to their implementation in clinical practice. Copyright:Methodology: The Ovid, Embase, and Cochrane Central databases were searched up to 20 September 2013. Heterogeneity analysis methods were classified into four categories, i.e., non-spatial methods (NSM), spatial grey level methods (SGLM), fractal analysis (FA) methods, and filters and transforms (F&T). The performance of the different methods was compared.Principal Findings: Of the 7351 potentially relevant publications, 209 were included. Of these studies, 58% reported the use of NSM, 49% SGLM, 10% FA, and 28% F&T. Differentiation between tumor types, tumor grading and/or outcome prediction was the goal in 87% of the studies. Overall, the reported area under the curve (AUC) ranged from 0.5 to 1 (median 0.87). No relation was found between the performance and the quantification methods used, or between the performance and the imaging modality. A negative correlation was found between the tumor-feature ratio and the AUC, which is presumably caused by overfitting in small datasets. Cross-validation was reported in 63% of the classification studies. Retrospective analyses were conducted in 57% of the studies without a clear description.Conclusions: In a research setting, heterogeneity quantification methods can differentiate between tumor types, grade tumors, and predict outcome and monitor treatment effects. To translate these methods to clinical practice, more prospective studies are required that use external datasets for validation: these datasets should be made available to the community to facilitate the development of new and improved methods.