Distance-weighted neighboring sites models for methylation pattern inheritance
Cytosine methylation at CpG dinucleotides is a semistable epigenetic marker critical to the normal development of vertebrates. Abnormal levels of methylation are associated with a host of human diseases and disorders, and many diagnostic tools have been developed based on analysis of methylation in tissue samples. Methylation is governed by a complex set of dynamic processes and has been observed to exhibit cyclical gains and losses, leading to the development of stochastic models of its inheritance. Many such models have assumed independence between sites and have largely focused on the proportion of methylation present in a sample, ignoring the diversity that exists in individual patterns. When analyzed at a single-base resolution, methylation patterns exhibit strong evidence of spatial dependence, and a recently proposed neighboring sites model which incorporates dependence between pairs of adjacent CpG sites has offered significant improvements over independent models. CpG sites are non-uniformly distributed throughout the genome, and the number of bases separating ""adjacent"" sites can vary greatly. In this paper, we develop and test an extension of this neighboring sites model which places a distance-dependent weight on the association between each pair of neighboring sites. Models are compared with regard to their ability to produce simulations that are statistically similar to biological data. We find that the distance-weighted model offers substantive improvements over distance-blind approaches to modeling the dependence structure, particularly in cases where firm boundaries between methylated and unmethylated regions exist in the data.