Replicate reproducibility and genome-wide binding correlation

Miaomiao Li; Tao Yao; Wanru Lin; Will E. Hinckley; Mary Galli; Wellington Muchero; Andrea Gallavotti; Jin-Gui Chen; Shao-shan Carol Huang

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Replicate reproducibility and genome-wide binding correlation

ML Miaomiao Li

TY Tao Yao

WL Wanru Lin

WH Will E. Hinckley

MG Mary Galli

WM Wellington Muchero

AG Andrea Gallavotti

JC Jin-Gui Chen

SH Shao-shan Carol Huang

This method is extracted from research article: Nat Commun, May 2023

Double DAP-seq uncovered synergistic DNA binding of interacting bZIP transcription factors

DOI: 10.1038/s41467-023-38096-2

Request a Protocol

Ask a question

Favorite

To calculate correlation between replicates, we took the peaks called for individual replicates and used the db.count method from the R/BioConductor ChIPQC package^¹²⁵ (version 1.26.0) to count the number of sequencing reads in peaks with the following arguments: minimum mapping quality score of 30 (mapQCth=30), fragment size of 200 (fragmentSize=200), each peak must be present in both replicates (minOverlap=2), and report raw read count in the peaks (score= DBA_SCORE_READS). Pearson correlations were calculated, and scatter plots were made from log10(raw read counts +1) values from the two replicates.

To calculate pairwise correlation among all the DAP-seq and dDAP-seq samples, we first used the db.count method to combine the merged replicate GEM peaks reported for all samples to create a consensus peak set on which the sequencing reads were counted for each replicate, with the following arguments: minimum mapping quality score of 30 (mapQCth=30), fragment size of 200 (fragmentSize=200), each peak must be present in at least two samples (minOverlap=2), center the peaks and expand up- and downstream from the summit by 100 bp (summits=100), normalized to full library size (score=DBA_SCORE_NORMALIZED). From the consensus peak set, the regions that overlapped with the blacklist regions were removed and the regions that overlapped with the top 3000 most enriched peaks from each replicate were kept, resulting in a filtered consensus peak set. The normalized read counts at this filtered consensus peak set were extracted for each replicate, log2 transformed, and averaged between replicates. This created a log2 normalized read count vector for each sample. Pearson correlation was calculated between all pairs of samples to create the pairwise Pearson correlation matrix. With the ComplexHeatmap package^¹²⁶ (version 2.9.4), the Pearson correlation matrix was drawn as a heatmap with hierarchical clustering dendrogram calculated using the (1-Pearson correlation) values as distances between rows and columns and the average linkage method.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol