VirDetect is available at https://github.com/dmarron/virdetect. RNA-seq reads were aligned to hg38 (without chrEBV) using STAR v2.4.2a (1,080 multi-maps, 10 mismatches). Unmapped reads were next aligned to a masked viral FASTA using STAR v2.4.2a (52 multi-maps, 5 mismatches). Vertebrate viral FASTA (1,894 viruses) was downloaded from GenBank. Viral FASTA was masked for increased specificity. Regions were masked in two ways. (i) Viral reads of length 75 were simulated from the entire viral FASTA and then mapped to hg38 using STAR v2.4.2a (1,080 multi-maps, 5 mismatches). If the virus simulated reads mapped to the human genome, they were masked in the viral FASTA. (ii) Areas of low complexity (9 or more repeating single nucleotides, 7 or more repeating double nucleotides, 4 or more repeating nucleotide patterns of 3, 3 or more repeating nucleotide patterns of 4, 2 or more repeating patterns of 5, or 2 or more repeating nucleotide patterns of 5) were masked. Viruses were then quantified using the resultant SAM file.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.