Vortrag
Selecting explanatory variables with the modified version of the Bayesian Information Criterion
Dienstag 01.07.2008, 17:15 - 18:30
(Vortrag im Rahmen des Entrepreunarial Awards innerhalb des WOMEN FOR MATH SCIENCE)
Large genome scans, which are used to locate genes influencing quantitative traits (Quantitative Trait Loci, QTLs), can be considered as specific examples of data mining. The database consists of the genotypes of many genetic markers. The task of locating QTLs relies on identifying associations between these genotypes and the value of the trait in question. Since quantitative traits are usually influenced by several important genes, the relationship between trait values and marker genotypes is often modeled using multiple regression. The most difficult part in fitting a multiple regression model lies in the identification of important predictors. This could, in principle, be addressed by employing one of many criteria for model selection. However, there is a lot of evidence that in this case popular model selection criteria, like the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC), have a strong tendency to overestimate the number of QTL. To address this problem we developed the modified version of BIC (mBIC) ([1]-[6]), which enables the incorporation of prior knowledge on a number of regressors and prevents overestimation. In this talk we will present mBIC and explain the reasoning behind this criterion. We will also discuss the relation of mBIC to the Bonferroni correction for multiple testing and, if time permits, to the Bayes oracle, which minimizes the expected costs of inference. We will illustrate the performance of mBIC with computer simulations and a real data analysis.
[1] Bogdan M, Ghosh JK, Doerge RW. Modifying the Schwarz Bayesian Information Criterion to Locate Multiple Interacting Quantitative Trait Loci. Genetics, 2004; 167: 989-999.
[2] Baierl A, Bogdan M, Frommlet F, Futschik A. On Locating Multiple Interacting Quantitative Trait Loci in Intercross Designs. Genetics, 2006; 173: 1693-1703.
[3] Baierl A, Futschik A, Bogdan M, Biecek P. Locating Multiple Interacting Quantitative Trait Loci Using Robust Model Selection, Computational Statistics and Data Analysis, 2007; 51: 6423-6434.
[4] Zak M, Baierl A, Bogdan M, Futschik A. Locating Multiple Interacting Quantitative Trait Loci Using Rank-Based Model Selection. Genetics 2007; 176: 1845-1854.
[5] Bogdan M, Frommlet F, Biecek P, Cheng R, Ghosh JK, Doerge RW Extending the Modified Bayesian Information Criterion (mBIC) to dense markers and multiple interval mapping, Biometrics, doi: 10.1111/j.1541-0420.2008.00989.x , 2008.
[6] Bogdan M, Ghosh JK, Żak-Szatkowska M . Selecting explanatory variables with the modified version of the Bayesian Information Criterion, to appear in Quality and Reliability Engineering International, 2008.