This method is the most precise procedure to map replication origins, although differences in SNS-seq and bioinformatics analysis methodologies, often using no or unsuitable controls, have affected the false-positive rate (FPR) in origin identification, resulting in varying properties attributed to metazoan origins4,10,13,15–17,44. Here, we are providing our SNS-seq protocol and an analysis pipeline. Briefly, cells were lysed with DNAzol, and then nascent strands were separated from genomic DNA based on sucrose gradient size fractionation2. Fractions corresponding to 0.5–2 kb were pooled, incubated with T4 polynucleotide kinase (NEB) for 5′ end phosphorylation, and digested by overnight incubation with 140 units of λ-exonuclease (λexn). A second round of overnight digestion with 100 units of λexn was performed. λexn digests contaminating broken genomic DNA, but not RNA-primed nascent strands22. As experimental background control, high molecular weight genomic DNA for each cell type was heat-fragmented to the same size as nascent strands, incubated with RNase A/XRN-1 to remove the RNA primer in any contaminating nascent strand, and then treated with the same amounts of λexn as the samples.
We should stress that the conditions ours and most laboratories use for the SNS-Seq are strictly different from the report claiming a possible bias of the lambda exonuclease digestion44. First, in classical SNS-Seq protocols, nascent RNA-primed at replication origins are purified by melting DNA followed by the separation of the nascent strands from the bulk parental DNA by sucrose gradient centrifugation. Only then, the purified nascent strands are digested with exhaustive lambda exonuclease digestion (more than 2000 u/μg DNA). This is not the case in Foulk et al.44 in which bulk DNA is simply enriched in replication intermediates by using BND cellulose, which fractionates whole DNA that is partly single stranded. Lambda exonuclease is then used, resulting in an enzyme to DNA ratio 1000–3000-fold less than the ratio our laboratory employs. We also repeatedly reported that all our control samples (Nascent strands from mitotic DNA, or G0 DNA, or high molecular weight DNA give very low enrichment values2,4,22,48,62).
The quality of origin enrichment in each sample was first tested by qPCR using primers against known human replication origins. Primers used to detect origin activity for various origins are given in Supplementary Data 4. Single stranded nascent strands were first purified using the CyScrib GFX Purification Kit (Illustra, 279606-02), then converted into double-stranded DNA by random priming using DNA polymerase I (Klenow fragment) and the ArrayCGH Kit (Bioprime, 45–0048). cDNA libraries were prepared using the TrueSeq Chip Library Preparation Kit (Illumina). In parallel, heat-denatured genomic DNA input controls were also purified, random-primed and libraries prepared in the same manner. All samples were sequenced at the Montpellier GenomiX (MGX) facility using an Illumina HiSeq 2500 apparatus. bcl2fastq version 2.17 from Illumina was used to produce the fastq files. Illumina reads (50 bp, single-end) from each SNS-seq replicate were trimmed and aligned to hg38 using Bowtie2 (v2.2.6). Peaks were called using two peak calling programs: MACS263 (v2.2.1) and SICER64 (v1.1 modified to contain hg38 and mm10). Peaks were first called using MACS2 (default parameters plus–bw 500 -p 1e-5 -s 60 -m 10 30–gsize 2.7e9), followed by peak calling by SICER [parameters: redundancy threshold = 1, window size (bp) = 200, fragment size = 150, effective genome fraction = 0.85, gap size (bp) = 600 and FDR = 1e-3]. MACS2 peaks that intersect SICER peaks from each sample were merged using bedtools intersect to generate a comprehensive list of all human DNA initiation sites (IS; Table 1). Blacklisted regions as defined by the ENCODE project (hg38, ENCSR636HFF) were subtracted from the final human DNA replication origin list. Mouse SNS-seq samples were processed as human SNS-seq and were also divided into quantiles (mQ1-mQ10) with each quantile containing 25,168 regions. Principal component and analysis and sample distances suggest that for cell types obtained from a single donor (i.e. HMEC), the overlap of origins is stronger amongst the replicates, than it is with other cell types. For donor-derived cell type (hematopoietic cells), we observed that the SNS-seq samples are more similar within the same donor than with treatment status (i.e. treatment with EPO). This is in contrast with the RNA-seq data, where samples cluster according to their treatment (EPO) and not their origin (donor).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.