Unified sparse regression models for sequence variants association analysis
Description
Joint adjustment of cryptic relatedness and population structure is necessary to reduce bias in DNA sequence analysis; however, existent sparse regression methods model these two confounders separately. Incorporating prior biological information has great potential to enhance statistical power but such information is often overlooked in many existent sparse regression models. We developed a unified sparse regression (USR) to incorporate prior information and jointly adjust for cryptic relatedness, population structure and other environmental covariates. Our USR models cryptic relatedness as a random effect and population structure as fixed effect and utilize the weighted penalties to incorporate prior knowledge. As demonstrated by extensive simulations, our USR algorithm can discover more true causal variants while maintain a lower false discovery rate than do several commonly used feature selection methods. It can detect rare and common variants with almost equal efficiency. After further investigation and assessing the oracle property of the USR method, we propose a unified test (uFineMap) for accurately localizing causal loci and a unified test (uHDSet) for identifying high-dimensional sparse associations in deep sequencing genomic data of multi-ethnic individuals. These novel tests are based on scaled sparse linear mixed regressions with Lp (0