Integrated Analysis Of Genomic And Longitudinal Clinical Data
Clinico-genomic modeling refers to the statistical analysis that incorporates both clinical data such as medical test results, demographic information and genomic data such as gene expression profiles. It is an emerging research area in biomedical science and has been shown to be able to extend our understanding of complex diseases. We describe a general statistical modeling strategy for the integrated analysis of clinical and genomic data in which the clinical data are longitudinal observations. Our modeling strategy is aimed at the identification of disease-associated genes and it consists of two stages. In the first stage, we propose a hierarchical B spline model to estimate the disease severity trajectory based on the clinical variables. This disease severity trajectory is a functional summary of the disease progression. We can extract any characteristics of interest from the trajectory. In the second stage, combinations of the extracted characteristics are included in the gene-wise linear model to detect the genes that are responsible for variations in the disease progression. We illustrate our modeling approach in the context of two biomedical studies of complex diseases: tuberculosis (Tb) and colitis-associated carcinoma. The animal experimental subjects were measured longitudinally for clinical information and biological samples were extracted at the final points of the subjects to determine the gene expression profiles. Our results demonstrate that the incorporation of the longitudinal clinical data increases the value of information extracted from the expression profiles and contributes to the identification of predictive biomarkers.