We developed an analysis pipeline to identify bona fide lncRNAs from the newly generated silkworm transcriptome (Fig 1). (1) Transcripts that overlapped with any protein-coding exon in the sense orientation were removed; (2) transcripts with < 200 bp, single-exon, read coverage < 0.8, and FPKM < 0.1 were eliminated; (3) transcripts with predicted large ORFs (> 100 aa) were filtered out; (4) transcripts with predicted protein-coding potential were removed (protein-coding potential criteria: CPC score > 0, CPAT score > 0.345, and CNIC score > 0) [48–50]; (5) transcripts with similarity to known protein sequences in the Swiss-Prot database (E-value < 1e-6) [51] and known protein-coding domains in the Pfam (AB) database (E-value < 1e-6) [52] were discarded; (6) transcripts within the < 2k scaffold-end range were excluded; (7) finally, transcripts with class code ‘i’,‘u’,’x’ subsets were retained as bona fide silkworm lncRNAs.
FPKM, Fragments per kilobase of transcript per million mapped reads; ORF, open reading frame; CPC, coding potential calculator; CPAT, RNA coding potential assessment tool; CNIC, coding non-coding index.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.