Summary
This protocol outlines the bioinformatics workflow for hierarchical clustering of H3K27ac peaks.
Software and datasets
bwa (https://bio-bwa.sourceforge.net/)
samtools (https://www.htslib.org/)
bedtools (https://bedtools.readthedocs.io/en/latest/)
Picard (https://broadinstitute.github.io/picard/)
MACS (https://github.com/macs3-project/MACS)
fastp (https://github.com/OpenGene/fastp)
fastqc (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
UCSC tools, bedgraphTobigWig (https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/)
HOMER (http://homer.ucsd.edu/homer/)
phantompeakqualtools (https://github.com/kundajelab/phantompeakqualtools)
Procedure
These steps (A-C) were performed for each H3K27ac sample.
A. Sequence Quality Assessment and Filtering
A1. quality check
fastqc sample_H3K27ac.fastq.gz
A2. sequence clean up
fastp -i sample_H3K27ac.fastq.gz -o sample_H3K27ac.clean.fastq.gz \
--html fastp_report.html --json fastp_report.json --thread 4
B. Alignment to human genome
bwa mem -M -t 4 hg38.fasta sample_H3K27ac.fastq.gz | samtools view -Su - > temporary.bam
samtools sort -@ 4 -o sorted.bam temporary.bam
samtools view -F 1804 -q 30 -b sorted.bam > filtered.bam
C. File preparation for peak calling
C1. remove duplicates
java -jar picard.jar MarkDuplicates --INPUT filtered.bam --OUTPUT filtered-md.bam --METRICS_FILE filtered.metric.txt --VALIDATION_STRINGENCY LENIENT --ASSUME_SORTED true --REMOVE_DUPLICATES false
samtools view -F 1804 -b filtered-md.bam > final.bam
C1. estimate fragment length
bedtools bamtobed -i final.bam | awk 'BEGIN{OFS="\t"}{$4="N";$5="1000";print $0}' | gzip -nc > final.tagAlign.gz
Rscript phantompeakqualtools/run_spp.R -c=final.tagAlign.gz -savp -out=final.scc.txt
D. Peak calling
macs2 callpeak -t final.tagAlign.gz -c input.tagAlign.gz -n test -q 0.01 --nomodel --extsize fragment_length_estimated --keep-dup all -g hs -B --SPMR
E. Visualization files generation
macs2 bdgcmp -t treatment_pileup.bdg -c control_lambda.bdg -o chip_signal_FE.bdg -m FE
bedGraphToBigWig chip_signal_FE.bdg chrom.sizes chip_signal_FE.bw
F. Merging of H3K27ac Peaks Across Samples
cat *.narrowPeaks | sort -k1,1 -k2,2n | bedtools merge -i - > merged_H3K27ac_peaks.bed
G. Computing coverage and unsupervised hierarchical clustering
G1. Use homer to create tag directory for each sample
makeTagDirectory final/ final.tagAlign.gz
G2. Use homer to compute coverage for each sample, given the peaks.
annotatePeaks.pl merged_H3K27ac_peaks.bed hg38 -dfile All_H3K27acSample_TagDirectory.txt -norm > All_H3K27ac.merged_peaks.Coverage.txt
G3. Convert the coverage to log2 scale,
G4. Compute median absolute deviation,
G5. Filter out low varying peaks and perform unsupervised hierarchical clustering using tools such as R (https://www.r-project.org/).
Competing interests
The authors declare no competing interest.