To ensure that we saturated the ChIP-seq signal for all libraries, we performed signal saturation tests (Additional file 1: Figure S2A). With SAMtools view version 1.3, we subsampled quality filtered and duplicate removed reads from each biological replicate starting from 5 million reads to the maximum library depth, or to a maximum of 60 million reads, with a step of 5 million. For each subsampled set, we called enriched ChIP-seq regions using MACS2 version 2.1.1 [65] using the broad peak mode (options: -q 0.05 --broad --broad-cutoff 0.1). An input library from the same individual and tissue (Additional file 3: Table S3) and subsampled to the same sequencing depth was also used with MACS2. To discover biologically reproducible peaks, we looked for ChIP-seq peaks within replicates that overlapped by 50% of their length with at least 50% of the peak of another replicate. Reproducible peaks appearing in at least two biological replicates were merged to produce the biologically reproducible set of histone enrichment peaks, while those not overlapping another replicate were not used for further analyses. The numbers of peaks per replicate and those peaks that are reproducible in at least two replicates are shown in Additional file 1: Figure S2B. Biologically reproducible H3K4me3 and H3K27ac reached ChIP-seq saturation at 20 million reads, while H3K4me1 reached saturation at 40 million reads (Additional file 1: Figure S2A).
We used the ChIP-seq libraries for H3K27ac and H3K4me3 subsampled to 20 million reads for all further analyses. Twelve of the somatic H3K4me3 libraries and one testis H3K4me3 library had less than 20 million reads after quality control and duplicate removal (Additional file 3: Table S3), so we used all the reads from these libraries instead of subsamples. This did not reduce the total H3K4me3 peak numbers because H3K4me3 saturates at a sequencing depth well below 20 million reads, especially in the somatic tissues (Additional file 1: Figure S2A). We subsampled all the H3K4me1 and matched input libraries to 40 million reads. The matched input sample for the macaque muscle library (unique identifier do17779) had around 21 million reads, which were used in MACS2 with the H3K4me1 library do17771.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.