Genetic variants in regulatory regions have been recently identified as a dominant factor in formation of many phenotypic traits, including inherited predisposition to various diseases. Among other mechanisms, nucleotide substitution in promoters or enhancers can modify DNA affinity for specifically binding transcription factors in these regulatory segments. This in turn may affect gene expression in the cell types, in which these regulatory segments are active, and therefore the phenotype.
DNA regulatory segments as well as transcription factor binding sites may be assessed computationally. Information on the specificity of transcription factor binding and on the activity of particular DNA segments in particular cell types can be obtained in molecular genomic studies such as the ENCODE project. In our lab we analyzed a number of quantities affecting cell type specific binding of transcription factors including DNase accessibility, transcriptional activity of neighboring genes, sequence motifs of transcription factor binding sites and motifs of cofactor binding. Our objective was to separate the variables characterizing the cell type and the specificity of transcription factor binding. We have found found that the contribution of different chromatin activity parameters varies dramatically from one cell type to another. This makes difficult the prediction of the transcription factor binding profile in the target cell type which may be obtained only with a careful selection of a training cell type.
Tests on the human ENCODE datasets displayed a very high accuracy of the prediction of cell type specific binding of some transcription factors including CTCF and ZNF143, yet for other factors the prediction accuracy was rather modest. Yet, at least technically, a cell-type oriented computational annotation of regulatory regions is quite achievable.
Information on the contribution of particular substitutions into transcription factor binding may be recruited for prioritization of particular variants linked together in genetic studies. On the other hand, the activity of regulatory regions is mostly cell type specific, so information on the cell type, in which a DNA segment containing a regulatory substitution is active, may be important for identification of organs affected by a genetic variant.