Background: Genes that play an important role in tumorigenesis are expected to show association between DNA copy number and RNA expression. Optimal power to find such associations can only be achieved if analysing copy number and gene expression jointly. Furthermore, some copy number changes extend over larger chromosomal regions affecting the expression levels of multiple resident genes. Results: We propose to analyse copy number and expression array data using gene sets, rather than individual genes. The proposed model is robust and sensitive. We re-analysed two publicly available datasets as illustration. These two independent breast cancer datasets yielded similar patterns of association between gene dosage and gene expression levels, in spite of different platforms having been used. Our comparisons show a clear advantage to using sets of genes' expressions to detect associations with long-spanning, low-amplitude copy number aberrations. In addition, our model allows for using additional explanatory variables and does not require mapping between copy number and expression probes. Conclusion: We developed a general and flexible tool for integration of multiple microarray data sets, and showed how the identification of genes whose expression is affected by copy number aberrations provides a powerful approach to prioritize putative targets for functional validation.

Additional Metadata
Persistent URL dx.doi.org/10.1186/1471-2105-10-203, hdl.handle.net/1765/24937
Citation
de Menezes, R.X., Boetzer, M., Sieswerda, M., van Ommen, G.J.B., & Boer, J.M.. (2009). Integrated analysis of DNA copy number and gene expression microarray data using gene sets. B M C Bioinformatics, 10, 203–217. doi:10.1186/1471-2105-10-203