LncLOOM works on a set of sequences from different species. Typically, each sequence corresponds to a putative homolog from a different species. Currently, we work with only one sequence isoform per species, though adaptations to cases where multiple sequences exist per species, e.g., alternative splicing products, are possible. The input sequences are typically constructed through manual inspection of RNA-seq and EST data and existing annotations. Sequences used as LncLOOM inputs are available within the LncLOOM implementation: https://github.com/LncLOOM/LncLOOM. We note that some of the input sequences might be incomplete, and our framework contains specific steps to accommodate such scenarios. Prior to graph building, the set is filtered to remove identical sequences. This can be further adjusted by the user to remove sequences with percentage identity above a threshold—in which case LncLOOM uses a MAFFT MSA [44] to compute percentage identity between each pair of sequences, and retain, among the similar sequences, the one that appears first in the input dataset.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.