FASTQ files, composed of 222,242,982 and 154,469,827 total sequencing reads for the parental and acquired resistance cell ATAC-seq experiments, respectively, were analyzed using the MARIO Pipeline (69). Briefly, the pipeline first runs QC on the FASTQC files containing the sequence reads using FastQQ (v0.11.2) (www.bioinformatics.babraham.ac.uk/projects/fastqc/). If FastQC detects adapter sequences, the pipeline runs the FASTQ files through Trim Galore (v0.4.2) (www.bioinformatics.babraham.ac.uk/projects/trim_galore/), a wrapper script that runs cutadapt (v1.8.1) (70) to remove the detected adapter sequence from the reads. The quality controlled reads were then aligned to the reference human genome (hg19/GRCh37) using bowtie2 (v2.3.4.1) (71). The aligned reads (in a .BAM format) were then sorted using samtools (v1.8.0) (72), and duplicate reads were removed using picard (v1.89) (https://broadinstitute.github.io/picard/). Last, peaks were called using MACS2 (v2.1.0) (https://github.com/taoliu/MACS), resulting in 54,415 and 60,339 ATAC-seq peaks for the parental and acquired resistance datasets, respectively.
The ATAC-seq experimental design consisted of replicate experiments of parental cells and acquired resistance cells. After independently analyzing the four datasets using the MARIO pipeline, we concluded that the replicates were highly similar (based on peak overlap). The .FASTQ files for the replicates were thus concatenated into a single set of reads for each of the parental and acquired resistance experiments, and alignment and peak calling were performed as described above.
To identify regions of differential chromatin accessibility between the parental and acquired resistance ATAC-seq datasets, we used MAnorm (61) with default parameter settings for read shift size (100), peak width (1000), and distance cutoff (500). To identify peaks unique to each cell types, we used a P value cutoff of 0.01 and a fold change cutoff of 1. These settings resulted in 9202 parental-specific peaks, 16,262 acquired resistance peaks, and 41,727 common peaks. Each peak set was then examined for enriched TF binding site motif instances using the HOMER suite of tools (73), modified to include the set of motifs contained in the Cis-BP database (74).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.