C.3 Supplemental Table Legends
Supplemental Tables can be found with this article online at http://dx.doi.org/10.1016/j.ajhg.2017.06.011.
Table S1. Table describing the 17 cases with both breakpoints in non-coding regions. Case identifiers are provided per studied subject (Subject ID), in addition to their karyotypes using the International System for Human Cytogenetic Nomenclature (ISCN2016) and array information reported in hg19 unless otherwise stated in hg18. Each case has two reported breakpoints (A and B), and for each we provide cytogenetic band and nucleotide locations in hg19 coordinates for the derivative chromosomes involved in their generation (der(A) and der(B)). We also report the sequencing reads by which the breakpoints were identified, and the overlap with known annotated genes (Disrupted gene 1 and Disrupted Gene 2), as well as the two nearest genes (Closest Gene 1 and Closest Gene 2) and their distance in base pairs (bp) to the breakpoint locations (Distance to gene 1 and Distance to Gene 2) in the derivative chromosomes. Negative distance numbers indicate genes upstream of the breakpoint position, while positive numbers indicate genes located downstream of the breakpoint.
Table S2. Overlap of non-coding DGAP breakpoint positions with gene promoters. Table reporting the number of annotated Ensembl GRCh37 gene promoters (Ensembl_GRCh37_promoters) that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end).
Table S3. Overlap of non-coding DGAP breakpoint positions with transcription factor binding sites. Table reporting the number of annotated Ensembl GRCh37 transcription factor binding sites (Ensembl_GRCh37_tfbindingsites) that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end).
Table S4. Overlap of non-coding DGAP breakpoint positions with enhancers. Table reporting the number of primary cell (Primary_cell_enhancers), tissue (Tissue_enhancers), H1-ESC (ChromHMM_H1_ESC_enhancers), GM12878 (ChromHMM_GM12878_enhancers), and VISTA (VISTA_db_hg19) enhancers that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Enhancer positions were obtained from Andersson et al., 2014, ENCODE, and the VISTA enhancer database human version hg19. Highlighted green rows indicate breakpoints which overlapped one or more of the enhancer categories analyzed.
Table S5. Overlap of non-coding DGAP breakpoint positions with DNaseI hypersensitive sites. Table reporting the number of DNaseI hypersensitive sites from H1-hESC, GM06990, GM12878, and the master table (a compilation of 125 cell lines DNaseI clusters) from ENCODE that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Highlighted green rows indicate breakpoints which overlapped one or more of the DNaseI hypersensitive sites in the different cell lines analyzed.
Table S6. Overlap of non-coding DGAP breakpoint positions with CTCF binding sites. Table reporting the number of ENCODE CTCF binding sites from H1-hESC and GM12878 that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Highlighted green rows indicate breakpoints which overlapped one or more of the CTCF binding sites in the two cell lines analyzed.
Table S7. Overlap of non-coding DGAP breakpoint positions with ENCODE chromatin state segments. Table reporting the ENCODE chromatin state segment classifications per non-coding DGAP breakpoint (DGAP id, chr, start, end) for H1-hESC and GM12878 cell lines. Chromatin state segment coordinates and other bed file information is displayed starting from column #bin until column itemRGB. Please refer to ENCODE’s bed items description from here: http://rohsdb.cmb.usc.edu/GBshape/cgi-bin/hgTables. Chromatin state names CTCF = CTCF binding site, E = enhancer, WE = weak enhancer, T = transcriptionally active, R = transcriptionally repressed.
Table S8. Overlap of non-coding DGAP breakpoint positions with repetitive elements. Table reporting the number of repetitive elements as assessed by Repeat Masker that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Repetitive elements information such as coordinates (Rep_chr, Rep_start, Rep_end), name, class and family are provided for each overlap.
Table S9. Overlap of non-coding DGAP breakpoint positions with topologically associating domains (TADs). Table reporting the number of TADs in H1-hESC and IMR90 (Dixon et al., 2012) that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). TAD information such as coordinates (TAD_chr, TAD_start, TAD_end) are provided for each overlap.
Table S10. Overlap of non-coding DGAP breakpoint positions with high-resolution chromatin subcompartments and arrowhead domains. Table reporting the number of high-resolution chromatin subcompartments and arrowhead domains in IMR90 and GM12878 (Rao et al., 2014) that overlap non-coding DGAP breakpoints (DGAP id, chr, start, end). Chromatin subcompartments and arrowhead domains information such as coordinates and class are provided for each overlap.
Table S11. Disruption of chromatin contacts by non-coding DGAP breakpoint positions. Table reporting the number of chromatin contacts disrupted by non-coding DGAP breakpoint positions (DGAP id, chr, start, end) within their analysis windows (window_start, window_end) in Hi-C datasets of 20 and 40 Kb resolution of H1-hESC (Dixon et al., 2012) (Esc_20kb_HindIII_rep1, Esc_20kb_HindIII_rep2, Esc_40kb_hindIII_combined, Esc_40kb_hindIII_rep1, Esc_40kb_hindIII_rep2), 20 and 40 Kb resolution of IMR90 (Dixon et al., 2012) (IMR90_20kb_hindIII_rep1, IMR90_20kb_hindIII_rep2, IMR90_40kb_hindIII_combined, IMR90_40kb_hindIII_rep1, IMR90_40kb_hindIII_rep2), 100Kb and 1Mb resolution of GM06990 (http://epigenomegateway.wustl.edu/) (GM06990_obsexp_100kb, GM06990_obsexp_1mb) and looplists from Rao et al., 2014 for GM12878 and IMR90 (GSE63525_GM12878_primary+replicate_HiCCUPS_looplist, GSE63525_IMR90_HiCCUPS_looplist).
Table S12. Disruption of GM12878 chromatin contacts at various resolution levels by non-coding DGAP breakpoint positions. Table reporting the number of chromatin contacts disrupted by non-coding DGAP breakpoint positions (DGAP id, chr, start, end) within their analysis windows (window_start, window_end) in the 50Kb, 100Kb, 250Kb, 500Kb and 1Mb resolution Hi-C datasets from Rao et al., 2014 for GM12878.
Table S13. Disruption of predicted disrupted ENCODE distal DHS/enhancer–promoter connections by non-coding DGAP breakpoint positions. Table reporting the number of predicted ENCODE distal DHS/enhancer–promoter connections (Thurman et al., 2012) (promoter_DHS_chr, promoter_DHS_start, promoter_DHS_end, promoter_DHS_gene, distal_DHS_chr, distal_DHS_start, distal_DHS_end, promoter_distal_DHS_correlation) by non-coding DGAP breakpoint positions (DGAP id, chr, start, end) within their ±500 Kb analysis windows (window_start, window_end).
Table S14. Genes with predicted disrupted ENCODE distal DHS/enhancer–promoter connections by the non-coding DGAP breakpoint positions. Table reporting the names of genes (Genes) separated from their predicted enhancers (Disrupted_enh_prom_interactions) (Thurman et al., 2012).
Table S15. Identification of genes with potential position effects. Table reporting the candidate genes (ensembl_gene_ID, Gene_chr, Gene_start, Gene_end, Gene_name) and their various lines of selection evidence for each non-coding DGAP breakpoint position (DGAP id, chr, start, end) within their analysis windows (window_start, window_end). Evidence lines include Hi-C domain inclusion (Hi_domain, HiC_chr, HiC_start, HiC_end), haploinsufficiency (HI_chr, Gene-start, gene_end, HI_prob, Haploinsufficiency_score,), triplosensitivity (Triplosensitivity_score), phenomatch score (PhenoScore, MaxPhenoScore, Phone_percentile, count_Pheno_percentile, MaxPheno_percentile, count_MaxPheno_percentile, Percentile_final_count). All of the evidence information is summarized (6Mb, 2Mb, TAD, DHS, Count_haplo, count_triplo) and the gene rankings are presented in the PERC+DHS+TAD+HAPLO+TRIPLO and PERC+DHS+2Mb+HAPLO+TRIPLO columns which take different evidence lines into considerarion. Green row highlight indicates highest ranking gene, and yellow row highlight indicates second best ranking genes.
Table S16. Translation of DGAP clinical features to HPO terms. Table reporting the HPO identifiers per DGAP case.
Table S17. Identification of genes with potential position effects for known pathogenic positive controls. Table reporting the candidate genes (ensembl_gene_ID, Gene_chr, Gene_start, Gene_end, Gene_name) and their various lines of selection evidence for the set of known pathogenic rearrangement positive controls (DGAP id, chr, start, end) within their analysis windows (window_start, window_end) from Redin et al., 2017. Evidence lines include Hi-C domain inclusion (Hi_domain, HiC_chr, HiC_start, HiC_end), haploinsufficiency (HI_chr, Gene-start, gene_end, HI_prob, Haploinsufficiency_score,), triplosensitivity (Triplosensitivity_score), phenomatch score (PhenoScore, MaxPhenoScore, Phone_percentile, count_Pheno_percentile, MaxPheno_percentile, count_MaxPheno_percentile, Percentile_final_count). All of the evidence information is summarized (6Mb, 2Mb, TAD, DHS, Count_haplo, count_triplo) and the gene rankings are presented in the PERC+DHS+TAD+HAPLO+TRIPLO and PERC+DHS+2Mb+HAPLO+TRIPLO columns which take different evidence lines into considerarion. Yellow row highlight indicates pathogenic genes reported by Redin et al., 2017.
Table S18. Identification of disrupted chromatin contacts between disrupted DHS and enhancers by the non-coding DGAP breakpoint positions. An agnostic search revealed the existence of chromatin contacts between breakpoint-disrupted sequences of DHS sites and gene enhancers in Hi-C data of H1-hESC cells at 40 Kb resolution (Dixon et al., 2012). The reported genes are our top position effect candidate genes in the region. Table columns report the candidate gene information (Gene_chr, Gene_start, Gene_end, Gene_name), the associated DGAP case information (DGAP_ID, DGAP_chr, DGAP_start, DGAP_end) and the disrupted Hi-C chromatin interaction (HiC_1_chr, HiC_1_start, HiC_1_end, HiC_2_chr, HiC_2_start, HiC_2_end, HiC_1_interaction).
Table S19. Overlap of non-coding DGAP breakpoint positions with DECIPHER cases. Table reporting the number of DECIPHER cases that overlap non-coding DGAP breakpoints (DGAP_ID, DGAP_chr, DGAP_start, DGAP_end). DECIPHER case information such as ID_patient, chr_start, chr_end, chr, mean_ratio, classification_type and phenotype are provided for each overlap.
Table S20. Genes contained within overlapped DECIPHER cases by non-coding DGAP breakpoint positions. Table reporting the number of genes contained within overlapped DECIPHER cases by the non-coding DGAP breakpoints (DGAP_ID, DGAP_chr, DGAP_start, DGAP_end). DECIPHER case and gene information such as gene_count, DECIPHER_ID, DECIPHER_chr, DECIPHER_start, DECIPHER_end, DECIPHER_value, DECIPHER_type_rearr, DECIPHER_phenotype and HG_symbol are provided for each overlapped DECIPHER case.
Table S21. DECIPHER cases overlapped by non-coding DGAP breakpoint positions that fulfilled non-coding selection criteria. Table reporting the number of DECIPHER cases that have non-coding breakpoints. DGAP comparison case information (DGAP_ID, DGAP_chr, DGAP_start, DGAP_end) is provided, as well as overlapped DECIPHER case information containing id_patient, chr_start, chr_end, chr, mean_ratio, classification_type and phenotype.
Table S22. Overlap of non-coding DGAP breakpoint positions with dbVar cases. Table reporting the number of dbVar cases that overlap non-coding DGAP breakpoints (DGAP_chr, DGAP_start, DGAP_end, DGAP_ID). dbVar case information such as dbVar ID, Start, End, Variant type, Gene, Molecular consequences, Most severe clinical significance, 1000G minor allele, 1000G MAF, GO-ESP minor allele, GO-ESP MAF, ExAC minor allele, ExAC MAF, Publications (PMIDs), Variant allele, Transcript change, RefSeq, Protein change, Molecular consequence, HGVS_c, HGVS_g, HGVS_ng, HGVS_p, Condition, Most severe clinical significance, Submitters, Highest review status and Last evaluated are provided for each overlap.