Genome folding in evolution and disease

2.1 Introduction

Paralog genes arise from gene duplication events during evolution. The resulting sequence similarity between paralog pairs might lead to similar structure and function of encoded proteins (Koonin 2005). Since paralogs often form part of the same protein complexes and pathways, it is advantageous for the cell to coordinate their expression (Makova and Li 2003).

In eukaryotes, genes are regulated in part by binding of transcription factors to promoter sequences and to distal regulatory regions such as enhancers. By chromatin looping, enhancer bound proteins can physically interact with the transcription machinery at the promoter of genes (Ptashne 1986; Deng et al. 2012; Carter et al. 2002; Tolhuis et al. 2002; Spitz and Furlong 2012). These chromatin looping events can be measured by chromatin conformation capture (3C) experiments (Dekker et al. 2002), which use proximity-ligation, and more recently high-throughput sequencing (Hi-C) to measure DNA-DNA contact frequencies genome-wide (Lieberman-Aiden et al. 2009).

These interaction maps revealed tissue-invariant chromatin regions, named topologically associating domains (TADs), which have more interactions within themselves than with other regions (Dixon et al. 2012; Nora et al. 2012; Sexton et al. 2012). TADs seem to be stable across cell types and conserved between mammals (Dixon et al. 2012; Rao et al. 2014; Vietri Rudan et al. 2015). Regions within TADs show concerted histone chromatin signatures (Dixon et al. 2012; Sexton et al. 2012), gene expression (Le Dily et al. 2014; Nora et al. 2012), and DNA replication timing (Pope et al. 2014). Furthermore, disruption of TAD boundaries is associated to genetic diseases (Ibn-Salem et al. 2014; Lupiáñez et al. 2015).

We wondered if the Hi-C data could reveal evolutionary pressure driving paralogous expansion to favour the clustering of paralogs in the three-dimensional chromatin architecture and their regulation by common enhancer elements to enable the cell to fine-tune and coordinate their expression. To do this, we collected Hi-C data from a number of studies profiling contacts in several cell types from human (Dixon et al. 2012; Rao et al. 2014), mouse and dog (Vietri Rudan et al. 2015), and we compared the properties of these data with respect to paralog genes. Our results pinpoint that pairs of paralog genes tend to be co-regulated and co-occur within TADs more often than equivalent control gene pairs. When placed in different TADs, paralogs still tend to co-occur in the same chromosome and have more contacts than control gene pairs. In contrast, close paralogs in the same TAD have significantly less contacts with each other than comparable gene pairs, which could indicate that these pairs of paralogs encode proteins that functionally replace each other.

These observations have relevance for the study of the evolution of chromatin structure and suggest that tandem duplications generating paralogs are under selection according to how they contribute or not to the fine structure of the genome as reflected by TADs. Thus TADs provide a favorable environment for the co-regulation of duplicated genes, which is likely followed by the evolutionary generation of additional regulatory mechanisms allowing the separation of paralogs into different TADs in the same chromosome but connected, and eventually their migration into different chromosomes.