H3K27me3 ChIP-seq data analysis

TM Trisha A. Macrae
MR Miguel Ramalho-Santos
request Request a Protocol
ask Ask a question
Favorite

Reads that passed quality control were trimmed of adaptors using Trim Galore! v0.4.0 and aligned to hg19 and mm10 using bowtie2 v2.2.5131 with no multimapping. SAM files were converted to BAM files, sorted and indexed with samtools v1.9. Species-specific library sizes were determined from total mapped reads in hg19 bam files (using bowtie2 mapping statistics) and mm10 bam files (using samtools view -c). Normalization factors (NFs) are the ratio of mouse/human enrichment (mm10/hg19 reads) for each sample taken as a fraction of input (see Source Data), as in van Mierlo et al.11. ChIP-seq NFs correlate well with ChIP recovery as a fraction of input (Fig. 3b). Bam files were deduplicated for all downstream analyses, performed using picard v2.18.14 MarkDuplicates (http://broadinstitute.github.io/picard).

H3K27me3 ChIP-seq data from embryos were downloaded as fastq files from NCBI GEO. Preimplantation sequences are from Liu et al.62 (GSE73952), postimplantation sequences from Wang et al.59 (GSE97778), and wild-type ES cell data from van Mierlo et al.11 (GSE101675). For paired-end samples, only one read was kept per fragment and all samples were trimmed, aligned to mm10, sorted and deduplicated as above. Deduplicated bam files were analyzed using deepTools v3.3.1 on the command line132.

Deduplicated bam files were converted to scaled bedgraphs using deepTools v3.3.1 bamCoverage (options --scaleFactor <NF> --binSize 10 --blackListFileName ENCODE_mm10_blacklist.bed) and then to bed files: awk ‘{print $1”\t”$2”\t”$3”\tid-“NR”\t”$4”\t.”}’. These scaled bed files were used to call broad peaks compared to input using epic2 on the command line (options -gn mm10, -d chrM). Bedtools (v2.28.0) merge was used to merge peaks within 3 kb, and bedtools intersect was used to determine a set of common peaks between replicates. Bam files were converted to scaled bigWigs using deepTools bamCoverage (options --binSize 100 --scaleFactor <NF>). Correlation between replicates was checked by multiBigWigSummary bins and plotCorrelation, and then scaled bw files were merged (bigwigCompare add) for heatmaps. computeMatrix was used to generate coverage of scaled bigwig files over no-auxin peaks (options scale-regions -m 500 --upstream 10000 --downstream 10000 --binSize 100 --missingDataAsZero --skipZeros --sortRegions descend --sortUsing mean --sortUsingSamples 1 -p max). Heatmaps were produced using deepTools plotHeatmap.

TSS profile plots were generated from the output of deepTools plotProfile (--outFileNameData), which was imported into R, processed to average replicates and plotted with ggplot2. For H3K27me3 coverage over Nodal, bigwig files were downloaded from Wang et al.59 (GEO GSE97778). Sample tracks were visualized in Integrated Genome Viewer using bigwig files (IGV v2.3.92).

For PRC1 comparisons, BED files of Rnf2 and Cbx7 peak locations in E14 ES cells are from133 (GEO GSE42466). BED files were converted to mm10 using UCSC liftOver. deeptools (v3.3.1) computeMatrix (options reference-point -a 5000 -b 5000 --referencePoint center --missingDataAsZero --skipZeros --smartLabels -p max) was used to compute normalized bigwig coverage of H3K27me3 over these peak regions. TSS profile plots were produced with plotProfile.

multiBamSummary was used to count H3K27me3 reads falling into non-overlapping 10 kb genes across the genome, and read counts were then imported into R. Embryo H3K27me3 ChIP-seq counts were normalized by library sizes (number of mapped reads in deduplicated bam files), and ES cell data were normalized by spike-in factors. For cumulative distribution plots, reads were counted in non-overlapping 10 kb genomic bins using deeptools multiBamSummary (options --smartLabels --blackListFileName --outRawCounts --minMappingQuality 10 -p max). The resulting counts table was imported into R, filtered to remove regions without coverage, scaled with the NFs calculated above and plotted using ggplot2 (stat_ecdf). P-values represent Kolmogorov–Smirnov test results using the averages of biological replicates. Counts per bin were adjusted for biological batch (embryo vs. ES cell origin) using ComBat/sva in R134 and analyzed by PCA.

H3K27me3 was counted over repetitive elements annotated in the mouse genome (obtained from UCSC RepeatMasker) using featureCounts (options -f -O -s 0 -T 8). In R, we filtered out elements with low coverage, scaled using the NFs calculated above, and calculated the average of replicates. Plot shows regions with >5 normalized counts for hyper-H3K27me3 and <3 for hypo-H3K27me3 elements, with |log2(Usp9x-high/no-auxin)| >0.7 as the threshold for enrichment.

For boxplot quantification of in vivo H3K27me3 signal over genomic regions, bedtools slop was used to define generous windows of promoter regions: TSS with 10 kb upstream and 1 kb downstream extensions (options -l 10000 -r 1000). deepTools multiBamSummary was used to calculate H3K27me3 from bam files over BED files of these promoter regions (options --minMappingQuality --outRawCounts). Raw counts for replicates were imported into R, normalized to input and library size, averaged, and plotted as boxplots.

For analysis of H3K27me3 enrichment at bivalent genes, the H3K27me3 peaks that overlap known bivalent gene promoters in wild-type ES cells in serum were taken to be bivalent peaks. Specifically, bivalent promoters were defined from a list of bivalent genes135 by bedtools slop (v2.28.0, options -l 1000 -r 100 -g < mm10_chrom_sizes> to extend TSS + 1 kb upstream and −100 bp downstream) and then intersected with H3K27me3 peaks defined in the no-auxin condition, as above (bedtools intersect with option -wb). deepTools multiBamSummary was used to count H3K27me3 reads falling in these 2566 bivalent peaks (options BED-file --smartLabels --outRawCounts -p 8) and over the list of all 17899 H3K27me3 peaks. Counts were downloaded for normalization and plotting in R.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A