Libraries for buffy coat DNA (100 ng) and ccfDNA (10 ng or the quantity equivalent to ccfDNA from 1 mL of plasma, whichever was greater) were prepared using the Kapa Biosystems Hyper Prep Kit for end repair, A-tailing, and ligation of truncated custom IDT adapters that contained an eight base-pair random barcode (i.e., unique molecular identifier) in the index 2 position of a standard Illumina adapter. The Kapa HiFi 2x master mix with truncated-length adapter primer was used for initial library amplification followed by the use of full-length indexing primers during subsequent PCR amplification steps. Full-length buffy coat DNA and ccfDNA libraries were enriched for regions of interest using a custom designed IDT Xgen capture probe set (Integrated DNA Technologies) containing full exonic or hotspot coverage of 128 genes (128 kb; S3 Table). Paired-end sequencing (2x125 bp) of libraries was performed on an Illumina HiSeq 2500. Reads in FASTQ files were aligned to the GRCh37 reference genome and those with the same unclipped alignment start position were grouped into families based on >0.875 molecular barcode similarity. Read sequence was extracted from each family and consensus called on each base position. Those with >0.66 concordance were assigned the predominant base, otherwise, an N. See the consensus aligned workflow for a description of the applications [30], settings, and steps taken to generate the consensus alignments (S13 Fig; S1 and S2 Files). Fragment length was derived from paired-end alignment information according to SAM format. Identification of wild type vs. variant allele was determined by a 100% match to an 11 bp string within aligned consensus sequences at the location corresponding to each known variant (S4 Table). Lastly, aligned base error rates and occurrence of localized false positive variants were calculated using our open source EstimateErrorRates and MpileupParser applications available in the USeq package [30]. The USeq EstimateErrorRates application calculates base level error rates observed in quality alignments (≥MQ20) from normal germline sequencing datasets. It parses a Samtools mpileup alignment stack for regions of 7 adjacent bases with adequate read depth (≥100 Q20 bases), no observed indels, and no indication of heterozygous or homozygous SNVs (allele frequencies ≤0.1). Good quality (≥Q20), non-reference, center base observations in each passing region are tabulated. These are used to calculate error rates for each base as well as the total error observed from quality alignments and quality bases. The USeq MpileupParser works in a similar fashion by parsing a Samtools mpileup alignment stack covering bases in a bed file of the 128 kb capture panel coverage with 25 base pair padding. Only quality alignments (≥MQ20) and quality bases (≥Q20) are counted. Locations with evidence of a heterozygous or homozygous allele (AF > 0.1) are ignored. It outputs a bed file of each passing base with its observed non-reference allele frequencies. At FS ≥ 1, allele frequencies were binned (<0.1%, 0.1% to 1.0%, and 1% to 2.0%) and then tracked for presence/absence at subsequent family sizes.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.