Mouse ESC and cortical neuron Hi-C data published in [25] was downloaded from Gene Expression Omnibus (GEO) (Accession Number: GSE96107). These datasets represent two of the high-resolution mammalian Hi-C datasets published to date. Hi-C data were analysed using the Juicer analysis pipeline aligning to the mm10 genome build [26]. Parameters within Juicer were selected so that contacts with a mapping quality (MAPQ) below 30 were filtered. For each cell type all replicates were run through the Juicer pipeline separately and were combined using the “mega” option in Juicer. The map resolution (minimum bin size at which 80% of bins have over 1000 contacts), which is commonly used to specify the minimum bin size at which a Hi-C dataset should be analysed was 950 bp and 750 bp for ESC and cortical neuron, respectively. Hi-C data were binned at 10 kb and Vanilla coverage (VC) normalisation was employed (as it was compatible with both Arrowhead and TopDom). The sex chromosomes were excluded from the analysis throughout.
It has been shown that algorithmically determined TADs can vary widely depending on the TAD caller used [41, 42, 44]. In order to make sure that results are robust to the choice of TAD caller, TADs were called using Arrowhead and TopDom at 10 kb. Arrowhead TADs were called with default settings (m = 2000). TopDom TADs were called using the parameter w = 20, as this was deemed to be an appropriate setting to identify TADs at 10 kb (Additional file 1: Fig. S12). TADs detected by Arrowhead ranged in size from 120 Kb to 6.38 Mb and TopDom TADs ranged in size from 10 kb to 3.47 Mb. As TADs with a length constituting only several genomic bins are unlikely to be real we filtered out TopDom TADs < 90 kb. Arrowhead calls overlapping TADs. Whereas TopDom calls non-overlapping regions annotated “domain”, “boundary” or “gap”, similarly to Dali et al. only “domain” annotations were considered in this work [42]. TADs were called on Hi-C maps made from merged replicates. Between 21.38% and 45.5% of TADs called by one TAD caller had an equivalent TAD (within ± 2bins at both boundaries) called by the other TAD caller (Additional file 2: Table S6).
It is widely suggested that TADs are formed by a loop extrusion process involving convergent CTCF bound at TAD boundaries and cohesin [11]. Where indicated, TADs have been split into CTCF TAD or nonCTCF TADs. In order to do this ESC and cortical neuron CTCF ChIP-seq peaks (generated alongside the Hi-C data [25]) were downloaded from GEO (GEO Accession Number: GSE96107). TADs where both boundaries were within ± 1 bin (10 kb) of a CTCF peak were considered to be “CTCF TADs”, the equivalent TADs in random TADs or random genome TADs were used for comparison. Whereas, TADs with only one boundary or neither boundary within ± 1 bin (10 kb) of a CTCF peak were considered to be “nonCTCF TADs”. Between 23.74 and 45.99% of CTCF TADs and 15.13 and 28.72% of nonCTCF TADs called by one TAD caller had an equivalent CTCF/nonCTCF TAD (within ± 2bins at both boundaries) called by the other TAD caller (Additional file 2: Table S7).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.