We downloaded data from the NCBI30 repository on March 15th, 2020, with the following query: gene = “ORF1ab” AND host = “homo sapiens” AND “complete genome”. The query resulted in 407 non-repeated sequences (Table 3 (right)). We call this dataset NCBI-A, where 68 sequences belong to SARS-CoV-2. Then, we applied the procedure to translate the data into the set of sequence features, and we run the same state-of-the-art feature selection algorithm45. The result is a list of 10 different sequences (Table 1), for which just checking for their presence is enough to differentiate between SARS-CoV-2 and other viruses in the dataset, with a 100% accuracy. Each of the sequences, in fact, only appears in SARS-CoV-2 samples.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.