Open reading frame prediction and building a proteome database for MS sequence analysis

Open reading frames (ORFs) were extracted from the Trinity contigs (see above) by using a custom pipeline OkORF ( We generated two sets of ORFs, set A and B, with different filtering criteria. Set A is composed of 71,654 predicted ORFs filtering out low-quality models, while set B is composed of 294,515 predicted ORFs capturing all possible ORFs without quality filtration. To build a comprehensive proteome database for MS analysis, we chose set B as reference sequences, allowing false-positive ORF prediction, to maximize the sensitivity in peptide identification by MS. Partial ORFs (i.e., start codon or stop codon missing) were allowed. We added pig trypsin and human keratin to the reference proteome database as common contaminants. Anal and thoracic light organs from 6 specimens were dissected and were used for total RNA extraction using TRIzol (Invitrogen). Messenger RNA were purified from total RNA using Oligo-dT30 super mRNA purification kit (Takara). Cell-free in vitro protein expression was performed according to the manufacture’s instruction. For Wheat Germ Extract (Promega), 20 µg/ml for anal light organ, 10 µg/ml for thoracic light organ, and 10 µg/ml for firefly luciferase control of mRNA at a final concentration in the 20 µl of reaction mix were translated at 25ºC for 2 h. For Rabbit Reticulocyte Lysate System (Promega), 4 µg/ml of mRNA at a final concentration in the 20 µl of reaction mix were translated at 30ºC for 90 min.

