Genome folding in evolution and disease

6.6 Towards studying genome folding in specific conditions

Understanding the variability of genome folding across cell-types and conditions is a key challenge faced by ongoing and future genomic research (Yu and Ren 2017). It becomes increasingly clear that condition-specific long-range interaction data is needed to (i) associate non-coding variants to affected target genes, (ii) associate TF binding sites and enhancers to genes, (iii) understand the variability of long-range contacts across individuals and (iv) reveal the contribution of single cells to folding structures observed for cell populations. How computational modeling and targeted experimental approaches can address these questions is discussed below.

6.6.1 Interpretation of non-coding variants by interacting genes

The majority of disease-associated single nucleotide variants uncovered by genome-wide association studies (GWAS) reside in non-coding sequences. Many variants are located near cis-regulatory sequences and enhancers and could, therefore, contribute to pathogenesis by affecting transcription of specific genes (Hindorff et al. 2009). The ability to measure long-range chromatin interactions allows understanding the role of non-coding variants by predicting its interacting target gene (Smemo et al. 2014; Visser et al. 2012). This has been demonstrated by measuring the interactions of promoters in 17 human primary hematopoietic cells types, revealing more than 2,400 potential disease-associated genes liked to thousands of GWAS variants (Javierre et al. 2016). In another study, the non-coding variants associated with schizophrenia could be annotated using Hi-C contact maps from human cerebral cortex (Won et al. 2016). These studies demonstrate how tissue-specific genome folding information can help to interpret not only structural variants but also single nucleotide variants. Furthermore, these examples highlight the need for tissue-specific interaction maps. While proximity-ligation experiments are still costly and time-intensive to produce for each tissue or individual of interest, computational predictions of such interactions from ChIP-seq and motif data will likely become and fast and accurate alternative in such studies.

6.6.2 Cell type-specific regulatory interactions between enhancers and genes

Studying the function of, and genes regulated by, transcription factors (TF) is challenging by their ability to not only bind to promoters, but also to distal sites in the genome. To link these binding sites to regulated genes requires chromatin looping information for the specific tissue of interest. While TADs seems to be mostly stable across different cell types (Dixon et al. 2012) and during differentiation (Dixon et al. 2015), individual interactions of distal TF binding sites or enhancers with their regulated target gene occur in cell-type or condition-specific manner (Bonev and Cavalli 2016; Andrey and Mundlos 2017). Enhancers are often located several hundred kb apart from their gene promoter, and most enhancers do not regulate the nearest gene (Sanyal et al. 2012). Therefore, tissue and condition-specific interactions of regulatory elements with genes are essential to predict gene-regulatory events correctly. Since many enhancer-gene interactions appear to be in enclosed CTCF loops (Hnisz et al. 2016 a), our prediction tool 7C can be used to predict such interactions by taking condition specific signals such as ChIP-seq as input. Therefore, 7C can improve condition-specific association of TF binding events to regulated target genes.

6.6.3 Variability of long-range interactions across individuals

Interestingly, initial studies start to analyze the variability of epigenetic marks by ChIP-seq of histone modifications across individuals (Grubert et al. 2015; Waszak et al. 2015). The variability of histone marks correlates with single nucleotide variants. These histone quantitative trait loci (hQTLs) cluster in TADs. However, it is not clear yet, how TADs or individual loops vary across individuals. Therefore, predictive modeling approaches like 7C could fill this gap to analyze the variability of chromatin looping contacts cross individuals. When combined with SNP data from the same individuals, one could potentially correlate genetic variants and loop variations across individuals to find loop quantitative trait loci (lQTLs).

6.6.4 Genome folding in single cells

In contrast to imaging-based methods, genome-wide proximity-ligation experiments such as Hi-C and ChIA-PET measure chromatin contacts in cell populations. This limitation raises the question whether observed structures like A/B compartments, TADs, or loops appear only stable in a subset of cells or appear from variable contacts in every single cell. Recently the Hi-C protocol has been adapted to measure contacts in single cells (Sekelja et al. 2016; Ulianov et al. 2017). A/B compartment structure has been confirmed by imaging methods on single chromosomes (Wang et al. 2016). TADs were also observed in single-cell Hi-C data (Nagano et al. 2013). However, imaging and single-cell Hi-C studies have illustrated significant cell-to-cell variability regarding 3D genome organization (Nagano et al. 2013; Ramani et al. 2016; Flyamer et al. 2017; Stevens et al. 2017). Accordingly, TADs seem to emerge from cell population averages and do not exist as static structures in individual cells. However, it is unclear how this variability relates to gene regulation in single cells. It will be an exciting future challenge to improve experimental protocols by separating RNA from chromatin in individual cells in a way to measure from the same cell both, gene expression and genome folding by single-cell RNA-seq and single-cell Hi-C, respectively.

Given the sparsity of coverage in current single-cell protocols (Nagano et al. 2013; Flyamer et al. 2017; Stevens et al. 2017), it would be interesting to computationally predict single cell structures from other genomic data measured along the linear genome in individual cells, such as transcription. Although single-cell ChIP-seq is published (Rotem et al. 2015), ChIP-seq needs a lot of sample material as input. Therefore, it will be challenging to produce TF binding profiles in single cells with high coverage. However, thanks to single-cell ATAC-seq (Buenrostro et al. 2015), prediction approaches like 7C could use chromatin accessibility data from single cells to predict variability in looping interactions across single cells.

6.6.5 Constantly improving targeted experimental methods.

While 7C archives high prediction performance (Chapter 5), due to its need for parameter training, it can only be as good as the quality of experimental methods. Currently, high-resolution Hi-C and ChIA-PET can be considered as gold standard for CTCF mediated long-range interactions (Rao et al. 2014; Tang et al. 2015). However, from the recent literature, it becomes clear, that there will be a continuously increasing development of experimental methods to probe interactions in a more efficient and targeted way (Denker and Laat 2016; Schmitt et al. 2016; Davies et al. 2017). Beside the first generation of genome-wide 3C methods, like Hi-C and ChIA-PET (see section 1.3.2), more recent experimental developments improve resolution and efficiency by often capturing only a specific subset of interest among all interactions. These methods include, for example, TCC (Kalhor et al. 2011), Capture-Hi-C (Dryden et al. 2014), Capture-C (Hughes et al. 2014), HiCap (Sahlén et al. 2015), NG Capture-C (Davies et al. 2016), HiChIP (Mumbach et al. 2016), or T2C (Kolovos et al. 2018). These methodological advances together with appropriate computational analysis will enable analysis of the plasticity and dynamics of chromatin interactions across diverse conditions.