A detailed description of the DArTseqTM methodology can be found in Jaccoud et al. (2001). The method often produces 69 base pairs (bp) long sequences. Genotyping of multiple loci was performed using DArTseqTM (Diversity Arrays Technology Pty Ltd., Canberra, Australian Capital Territory, Australia) for SNP loci and in silico DArT (PA of restriction fragments in the representation; PA loci) to determine the candidate sex-specific loci between male and female individuals. Approximately 100 ng of DNA from each sample was used for the development of DArTseqTM arrays. DNA samples were subjected to digestion/ligation reactions as described by Kilian et al. (2012) and digested with PstI and a second restriction endonuclease (SphI). Ligation reactions were performed using two adaptors: a PstI compatible adaptor consisting of an Illumina flow-cell attachment sequence, primer sequence, and a unique barcode sequence; and a SphI compatible adaptor consisting of an Illumina flow-cell attachment region. Ligated fragments were then amplified by PCR using the following parameters: initial denaturation at 94°C for 1 min, followed by 30 cycles of 94°C for 20 s, 58°C for 30 s, and 72°C for 45 s with a final extension step at 72°C for 7 min. Equimolar amounts of amplification products from each individual were pooled and subjected to Illumina’s proprietary cBot1 bridge PCR followed by sequencing on the Illumina HiSeq 2000 platform. Single read sequencing was run for 77 cycles.
Sequences were processed using proprietary DArTseqTM analytical pipelines (Ren et al., 2015). Initially, the HiSeq 2000 output (FASTQ file) was processed to filter poor-quality sequences. Two different thresholds of quality were applied. For the barcode region (allowing parsing of sequences into specific sample libraries), we applied stringent selection (minimum Phred pass score of 30, minimum pass length percentage 75). For the remainder of the sequence, relaxed thresholds were applied (minimum Phred pass score 10, minimum pass length percentage 50). Approximately 2,000,000 sequences per individual were identified and used in marker calling. Finally, identical sequences were combined into “fastqcoll” files that were used in the secondary proprietary pipeline (DArTsoft14) for SNP and PA loci calling. To this end, we used the “reference-free” algorithm implemented in DArTsoft14. The sequence clusters were then parsed into SNP and in silico DArTseqTM markers utilizing a range of metadata parameters derived from the quantity and distribution of each sequence across all samples in the analysis. Multiple libraries of the same individual were included in the DArTseqTM genotyping process, enabling reproducibility scores to be calculated for each candidate marker. Outputs by DArTsoft14 were then filtered on the basis of reproducibility values, average count for each sequence (sequencing depth), balance of average counts for each SNP allele, and call-rate (proportion of samples for which the marker was scored).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.