5.3 Discussion

We have developed 7C to reuse ChIP-seq data, profiling the interactions of proteins with genomes, for the prediction of chromatin looping interactions. We present this method as an alternative to dedicated techniques like Hi-C that directly measure genomic contacts. Since the results of ChIP-seq experiments are increasingly available for a large number of proteins, species, tissues, cell types, and conditions, our method offers a valid alternative when Hi-C data is not available, or cannot be produced due to cost or material limitations. Another major advantage of our method over Hi-C is that the predictions are at a base pair resolution, while Hi-C only reaches resolutions of at best kilo base pairs at a high cost.

Other computational approaches were developed to predict genomic contacts or assign regulatory regions to target genes. A commonly used approach is to compare activity signals at enhancers and promoters across many different conditions or tissues (Sheffield et al. 2013; Fishilevich et al. 2017; Andersson et al. 2014; O’Connor and Bailey 2014): high correlation indicates association and potential physical interactions between enhancers and genes. However, these approaches lose the tissue specificity of the interactions. Other approaches integrate many diverse chromatin signals such as post-translational histone modifications, chromatin accessibility, or transcriptional activity (Roy et al. 2015; Whalen et al. 2015; Zhu et al. 2016; Schreiber et al. 2017; Dzida et al. 2017), and combine them with sequence features (Zhao et al. 2016), or evolutionary constrains (Naville et al. 2015). While these methods predict enhancer-gene association with good performance, they require for each specific condition of interest a multiplicity of input datasets, which are often not available.

Further computational approaches try to directly predict chromatin interactions by using diverse sequence features (Nikumbh and Pfeifer 2017) or multiple chromatin features such as histone modifications (Brackley et al. 2016; Chen et al. 2016) or transcription (Rowley et al. 2017). One study makes use of the more recently discovered CTCF motif directionality to predict loop interactions from CTCF ChIP-seq peak locations (Oti et al. 2016). Another study combines CTCF binding locations and motif orientation with polymer modeling to predict Hi-C interaction maps (Sanborn et al. 2015). However, none of these studies predicts chromatin loops from ChIP-seq signals of TFs different from CTCF by taking the CTCF motif orientation into account. Furthermore, CTCF binding sites are often only considered, when the signal is strong enough for peak calling algorithms to identify binding sites. In contrast, 7C takes the distribution of ChIP-seq signals from all TFs into account without a peak-calling step. Furthermore, the other studies do not provide a tool for the direct prediction of pairwise interactions from single ChIP-seq experiments. Interestingly, shadow peaks in ChIP-seq data of insulator proteins in Drosophila were previously associated to long-range interactions (Liang et al. 2014) and used to study the contribution of sequence motifs and co-factors in loop formation (Mourad et al. 2017), but not to directly predict chromatin loop interactions.

Compared to the predictive methods mentioned above, our approach has the clear advantage to directly predict chromatin looping interactions, and not enhancer-promoter associations, by making use of ChIP-seq signals from a single experiment with respect to CTCF motifs. This gives the prediction a base pair resolution since it relies in the alignment of a pair of CTCF motifs. In fact, given several CTCF motifs within a 1kb genomic bin, our looping prediction approach can be used to decide which of the CTCF sites is actually involved in the measured interactions and thus increase resolution even when Hi-C data is available. We showed that our approach, 7C, can work with just a single ChIP-seq experiment for many different TFs, making it usable for many diverse conditions of interest. Therefore, 7C can be used complementary to existing enhancer-promoter association tools or can be integrated in such predictive models to improve them.

Association of distal cis-regulatory elements, such as TF binding sites or enhancers, to their regulated target gene is a common problem in genomic studies (Mora et al. 2015), and this can be addressed by methods mapping contacts at a base pair resolution. While Hi-C measures pairwise contacts genome-wide in an unbiased manner and experimentally measuring genomic interactions is now becoming feasible due to the recent advances in 3C based technologies (Sati and Cavalli 2016), a main drawback of the Hi-C method is the limitation of resolution. While the first Hi-C study analyzed chromatin interactions at bin sizes of 1Mb (Lieberman-Aiden et al. 2009), structural features such as topologically associating domains (TADs) were later called at 40kb bin resolution (Dixon et al. 2012), and the highest resolution for the human genome, reached only recently, is 1kb (Rao et al. 2014). This higher resolution requires largely increased sequencing depths (Rao et al. 2014; Bonev et al. 2017). Capture Hi-C identifies only the interactions of per-defined target regions such as promoters (Dryden et al. 2014; Mifsud et al. 2015). ChIA-PET restricts the interactions to those where a specific protein of interested is involved (Heidari et al. 2014; Fullwood et al. 2009; Tang et al. 2015). Therefore these experiments are not always applicable and require a large amount of sample material. Ultimately, even in high resolution Hi-C it is not trivial to connect enhancers to interacting genes in a unique way, since enhancers are often found in clusters within short distances in the genome (Whyte et al. 2013; Parker et al. 2013).

Currently, our method, by using CTCF motifs, focuses on CTCF mediated chromatin loops. It is very likely that other DNA binding proteins mediate loops: for example, recent studies suggest that other TFs are involved in enhancer promoter interactions during differentiation (Bonev et al. 2017) and knockout of transcriptional repressor YY1 and other candidate factors result in loss of chromatin loops (Weintraub et al. 2018). Using motifs predicted for these different transcription factors, or combinations thereof, are open avenues for the future extension of our method.