Genome folding in evolution and disease

3.1 Introduction

The three-dimensional structure of eukaryotic genomes is organized in many hierarchical levels (Bonev and Cavalli 2016). The development of high-throughput experiments to measure pairwise chromatin-chromatin interactions, such as Hi-C (Lieberman-Aiden et al. 2009) enabled the identification of genomic domains of several hundred kilo-bases with increased self-interaction frequencies, described as topologically associating domains (TADs) (Dixon et al. 2012; Nora et al. 2012). Loci within TADs contact each other more frequently and TAD boundaries insulate interactions of loci in different TADs. TADs have also been shown to be important for gene regulation by restricting the interaction of cell-type specific enhancers with their target genes (Nora et al. 2012; Symmons et al. 2014; Zhan et al. 2017). Several studies associated disruption of TADs to ectopic regulation of important developmental genes leading to genetic diseases (Ibn-Salem et al. 2014; Lupiáñez et al. 2015). These properties of TADs suggested that they are functional genomic units of gene regulation.

Interestingly, TADs are largely stable across cell-types (Dixon et al. 2012; Rao et al. 2014) and during differentiation (Dixon et al. 2015). Moreover, while TADs were initially described for mammalian genomes, a similar domain organization was found in the genomes of non-mammalian species such as Drosophila (Sexton et al. 2012), zebrafish (Gómez-Marín et al. 2015) Caenorhabditis elegans (Crane et al. 2015) and yeast (Hsieh et al. 2015; Mizuguchi et al. 2014). Evolutionary conservation of TADs together with their spatio-temporal stability within organisms, would collectively imply that TADs are robust structures.

This motivated the first studies comparing TAD structures across different species, which indeed suggested that individual TAD boundaries are largely conserved along evolution. More than 54% of TAD boundaries in human cells occur at homologous positions in mouse genomes (Dixon et al. 2012). Similarly, 45% of contact domains called in mouse B-lymphoblasts were also identified at homologous regions in human lymphoblastoid cells (Rao et al. 2014). A single TAD boundary at the Six gene loci could be traced back in evolution to the origin of deuterostomes (Gómez-Marín et al. 2015). However, these analyses focused only on the subset of syntenic regions that can be mapped uniquely between genomes and do not investigate systematically if TAD regions as a whole might be stable or disrupted by rearrangements during evolution.

A more recent study provided Hi-C interaction maps of liver cells for four mammalian genomes (Vietri Rudan et al. 2015). Interestingly, they described three examples of rearrangements between mouse and dog, which all occurred at TAD boundaries. However, the rearrangements were identified by ortholog gene adjacencies, which might be biased by gene density. Furthermore, they did not report the total number of rearrangements identified, leaving the question open of how many TADs are actually conserved between organisms. It remains unclear to which extent TADs are selected against disruptions during evolution (Nora et al. 2013). All these studies underline the need to make a systematic study to verify if and how TAD regions as a whole might be stable or disrupted by rearrangements during evolution.

To address this issue we used whole-genome alignment data to analyze systematically whether TADs represent conserved genomic structures that are rather reshuffled as a whole than disrupted by rearrangements during evolution. Furthermore, we used gene expression data from many tissues in human and mouse to associate disruptions of TADs by evolutionary rearrangements to changes in gene expression.