Protein sequences of five patchouli TPS genes were downloaded from NCBI (AY508726.1, AY508728.1, AY508729.1, AY508730.1, and AY508727.1). Longest ORF in each gene loci in the patchouli gene set was selected as representative sequence, and then representative sequences were BLASTed against to the five reference TPS proteins with e-value of 1e-2. Blast hits were further annotated by PFAM database using IPRSCAN5. If the candidate presents both the two TPS-related domains (PF03936: terpene synthase family, metal binding domain; PF01397: terpene synthase, N-terminal domain), it is classified as full length, while if the candidate presents only one of them, it is classified as partial. Similar methods were applied to the identification of TPS genes in the other eight species, including A. thaliana, Mi. guttatus, Sa. miltiorrhiza, Se. indicum, So. lycopersicum, So. tuberosum, U. gibba, and V. vinifera (Supplementary Table S17). The protein sequences of full length TPS genes identified above were aligned using MUSCLE v3.8.3153 with default parameters, and the corresponding CDS alignments were back-translated from the corresponding protein alignments using PAL2NAL60. RAxML56 was used to generate maximum likelihood with GTR + I + Γ model and 100 bootstraps. Trees were plotted by the iTOL (https://itol.embl.de/). The protein-coding gene annotations were updated with UTRs and models for alternative splicing using PASA pipeline (https://pasapipeline.github.io/). Then, genes and transcripts were quantified using align_and_estimate_abundance.pl provided by the Trinity61 package (version 2.2.0). The Pearson correlation of samples were calculated (Supplementary Figure S4).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.