Background: Modeling studies using hypothetical polygenic risk data can be an efficient tool for investigating the effectiveness of downstream applications such as targeting interventions to risk groups to justify whether empirical investigation is warranted. We investigated the assumptions underlying a method that simulates risk data for specific values of the area under the receiver operating characteristic curve (AUC). Methods: The simulation method constructs risk data for a hypothetical population based on the population disease risk, and the odds ratios and frequencies of genetic variants. By systematically varying the parameters, we investigated under what conditions AUC values represent unique ROC curves with unique risk distributions for patients and nonpatients, and to what extend risk data can be simulated for precise values of the AUC. Results: Using larger number of genetic variants each with a modest effect, we observed that the distributions of estimated risks of patients and nonpatients were similar for various combinations of the odds ratios and frequencies of the risk alleles. Simulated ROC curves overlapped empirical curves with the same AUC. Conclusions: Polygenic risk data can be effectively and efficiently created using a simulation method. This allows to further investigate the potential applications of stratifying interventions on the basis of polygenic risk.

doi.org/10.1371/journal.pone.0152359, hdl.handle.net/1765/87310
PLoS ONE
Department of Epidemiology

Kundu, S., Kers, J. G., & Janssens, C. (2016). Constructing hypothetical risk data from the area under the ROC curve: Modelling distributions of polygenic risk. PLoS ONE, 11(3). doi:10.1371/journal.pone.0152359