Chapter 5 Prediction of chromatin looping interactions

Preamble

This chapter is submitted for publication. A preprint is available on bioRxiv:

Ibn-Salem J#, Andrade-Navarro MA. Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs. bioRxiv. 2018. doi:10.1101/257584.

The preprint is available online: https://www.biorxiv.org/content/early/2018/02/01/257584. My contributions to this publication are indicated in Table E.1. The source code of the complete analysis is available at GitHub: https://github.com/Juppen/sevenC and https://github.com/Juppen/sevenC_analysis. Supplementary figures and links to supplementary tables are shown in Appendix D.

#corresponding author

Abstract

Background: Knowledge of the three-dimensional structure of the genome is necessary to understand how gene expression is regulated. Recent experimental techniques such as Hi-C or ChIA-PET measure long-range interactions genome-wide but are experimentally elaborate and have limited resolution. Here, we present Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs (7C).

Results: While ChIP-seq was not designed to detect contacts, the formaldehyde treatment in the ChIP-seq protocol cross-links proteins with each other and with DNA. Consequently, also regions that are not directly bound by the targeted TF but interact with the binding site via chromatin looping are co-immunoprecipitated and sequenced. This produces minor ChIP-seq signals at loop anchor regions close to the directly bound site. We use the position and shape of ChIP-seq signals around CTCF motif pairs to predict whether they interact or not.

We applied 7C to all CTCF motif pairs within 1 MB in the human genome and validated predicted interactions with high-resolution Hi-C and ChIA-PET. A single ChIP-seq experiment from known architectural proteins (CTCF, Rad21, Znf143) but also from other TFs (like TRIM22 or RUNX3) predicts loops accurately. Importantly, 7C predicts loops in cell types and for TF ChIP-seq datasets not used in training.

Conclusion: 7C predicts chromatin loops with base-pair resolution and can be used to associate TF binding sites to regulated genes in a condition-specific manner. Furthermore, profiling of hundreds of ChIP-seq datasets results in novel candidate factors functionally involved in chromatin looping. Our method is available as an R package: https://ibn-salem.github.io/sevenC/