In recent years, genome-wide association studies have been very successful in identifying loci for complex traits. However, typically these findings involve noncoding and/or intergenic SNPs without a clear functional effect that do not directly point to a gene. Hence, the challenge is to identify the causal variant responsible for the association signal. Typically, the first step is to identify all genetic variation in the locus region, usually by resequencing a large number of case chromosomes. Among all variants, the causal one needs to be identified in further functional studies. Because the experimental follow up can be very laborious, restricting the number of variants to be scrutinized can yield a great advantage. An objective method for choosing the size of the region to be followed up would be highly valuable. Here, we propose a simple method to call the minimal region around a significant association peak that is very likely to contain the causal variant. We model linkage disequilibrium (LD) in cases from the observed single SNP association signals, and predict the location of the causal variant by quantifying how well this relationship fits the data. Simulations showed that our approach identifies genomic regions of on average ∼50 kb with up to 90% probability to contain the causal variant. We apply our method to two genome-wide association data sets and localize both the functional variant REP1 in the synuclein gene that conveys susceptibility to Parkinson's disease and the APOE gene responsible for the association signal in the Alzheimer's disease data set.

, ,,
European Journal of Human Genetics
Erasmus MC: University Medical Center Rotterdam

Bochdanovits, Z, Simón-Sánchez, J, Jonker, M.A, Hoogendijk, W.J.G, van der Vaart, A, & Heutink, P. (2014). Accurate prediction of a minimal region around a genetic association signal that contains the causal variant. European Journal of Human Genetics, 22(2), 238–242. doi:10.1038/ejhg.2013.115