Replicate reproducibility and genome-wide binding correlation

ML Miaomiao Li
TY Tao Yao
WL Wanru Lin
WH Will E. Hinckley
MG Mary Galli
WM Wellington Muchero
AG Andrea Gallavotti
JC Jin-Gui Chen
SH Shao-shan Carol Huang
request Request a Protocol
ask Ask a question
Favorite

To calculate correlation between replicates, we took the peaks called for individual replicates and used the db.count method from the R/BioConductor ChIPQC package125 (version 1.26.0) to count the number of sequencing reads in peaks with the following arguments: minimum mapping quality score of 30 (mapQCth=30), fragment size of 200 (fragmentSize=200), each peak must be present in both replicates (minOverlap=2), and report raw read count in the peaks (score= DBA_SCORE_READS). Pearson correlations were calculated, and scatter plots were made from log10(raw read counts +1) values from the two replicates.

To calculate pairwise correlation among all the DAP-seq and dDAP-seq samples, we first used the db.count method to combine the merged replicate GEM peaks reported for all samples to create a consensus peak set on which the sequencing reads were counted for each replicate, with the following arguments: minimum mapping quality score of 30 (mapQCth=30), fragment size of 200 (fragmentSize=200), each peak must be present in at least two samples (minOverlap=2), center the peaks and expand up- and downstream from the summit by 100 bp (summits=100), normalized to full library size (score=DBA_SCORE_NORMALIZED). From the consensus peak set, the regions that overlapped with the blacklist regions were removed and the regions that overlapped with the top 3000 most enriched peaks from each replicate were kept, resulting in a filtered consensus peak set. The normalized read counts at this filtered consensus peak set were extracted for each replicate, log2 transformed, and averaged between replicates. This created a log2 normalized read count vector for each sample. Pearson correlation was calculated between all pairs of samples to create the pairwise Pearson correlation matrix. With the ComplexHeatmap package126 (version 2.9.4), the Pearson correlation matrix was drawn as a heatmap with hierarchical clustering dendrogram calculated using the (1-Pearson correlation) values as distances between rows and columns and the average linkage method.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A