The distribution of P. tricornutum genes in each transcriptional module was compared to the distribution of orthologous gene models (Phatr2.0 genome annotation) in microarray-derived transcriptional clusters generated as part of the DiatomPortal project (Ashworth et al., 2016). Only gene models that showed a one-to-one gene mapping (i.e., gene models that were neither split or merged, but including gene models that were truncated or extended) between version 2 (Phatr2) and version 3 (Phatr3) annotations of the P. tricornutum genome (Bowler et al., 2008; Rastogi et al., 2018) were considered.
Biological functions within the merged modules were identified using gene functional annotations from the Phatr3 annotation of the P. tricornutum genome (Bowler et al., 2008; Rastogi et al., 2018). These included: GO terms, using the R package TopGO (Aibar et al., 2015); PFAM domains and biological processes (Rastogi et al., 2018); probable evolutionary affinities inferred by BLAST top hit analyses (Rastogi et al., 2018); histone and DNA modifications associated with cells grown in replete media (Veluchamy et al., 2013, 2015); Polycomb group protein marks (Zhao et al., 2020); and KEGG orthology predictions, obtained with BLASTkoala, Kofamkoala and GHOSTkoala servers (Moriya et al., 2007; Kanehisa, 2017; Aramaki et al., 2019; Kanehisa and Sato, 2020). In silico targeting predictions were performed for all N-complete protein sequences (i.e., protein sequences inferred to start in a methionine) within the dataset, using HECTAR (Gschloessl et al., 2008); ASAFind v2.0 (Gruber et al., 2015), in conjunction with SignalP v3.0 (Bendtsen et al., 2004); MitoFates, with a threshold detection value of 0.35 (Fukasawa et al., 2015; Dorrell et al., 2017); and WolfPSort, taking the consensus best-scoring prediction using animal, fungi and plant reference datasets (Horton et al., 2007). Enrichments in each category were analyzed both qualitatively/manually and by a simple pivot table and chi-squared test. Tabulated lists of all annotations are presented in Supplementary Table 2.
Core chloroplast and mitochondria-associated functions were assembled from a list of 524 KEGG ortholog numbers based on previously identified chloroplast and mitochondria functions in photosynthetic eukaryotes (Dorrell et al., 2017; Nonoyama et al., 2019; Novák Vanclová et al., 2020). Where multiple candidate proteins were detected, proteins were assigned to either the chloroplast, mitochondria, or dual chloroplast/mitochondria (Gile et al., 2015; Dorrell et al., 2017) based on in silico targeting predictions. Where no clear targeting predictions could be obtained, proteins were identified based on BLAST similarity to orthologous chloroplast- or mitochondria-targeted proteins from other algal and stramenopile species (Dorrell et al., 2017; Río Bártulos et al., 2018). Disregarding 135 query proteins coded by organellar genomes in diatoms (Yu et al., 2018) and 17 query proteins encoded by nuclear genes with no PhaeoNet module assigned, the final set comprised of 372 unique proteins targeted to the chloroplast and/or mitochondrion, encoded by nuclear genes that belong to one of the 28 merged modules. The main metabolic pathways and complexes and quantitative pathway associations, are presented in Supplementary Table 3.
A complete list of P. tricornutum transcription factors (TF) was assembled from a previous dataset (Rayko et al., 2010) and an updated list specifically of aureochromes (Banerjee et al., 2016), which were mapped to the version 3 genome annotation by BLASTp analysis. A total of 188 candidates, from 18 TF families (HSF, Myb, Zn_finger_C2H2, bZIP, Zn_finger_CCCH, bHLH, Sigma-70, Zn_finger_TAZ, CBF/NF, E2F-DP, CSF, Aureochrome, TRF, CCAAT-binding, AP2-EREBP, TAF9, CXC, Homeobox) corresponded to genes assigned to a PhaeoNet merged module (Figure 5 and Supplementary Table 4). Given that the regulation of gene expression by transcription factors play a key role in the growth and progression of the cell cycle, the distribution within merged modules genes implicated in the cell cycle (cyclins) and in light perception events (e.g., phytochrome, cryptochrome) were additionally investigated, as well as genes implied in transcription and histone-related processes (Figure 5 and Supplementary Table 4; Huysman et al., 2013; Annunziata et al., 2019).
Phylogenetic and transcriptional dynamics of P. tricornutum sigma factors. This Figure shows an unrooted best-scoring tree topology for an 86 taxa x 453 aa alignment of subsampled diatom and non-diatom sigma factors and realized using MrBayes v 3.2.7a with the Jones substitution matrix, 600,000 generations, two start chains and 0.5 burnin thresholds (Huelsenbeck and Ronquist, 2001); and RAxML v 8.2 with the PROTGAMMAJTT substitution model with 300 bootstrap replicates (Stamatakis, 2014). Chloroplast-targeting predictions were performed using ASAFind with SignalP v 3.0 (Gruber et al., 2015); and HECTAR (Gschloessl et al., 2008) under default conditions. Branches are colored by phylogenetic affiliation and bootstrap values of nodes recovered with > 40% support are shown. Eight P. tricornutum sigma factors are labeled with PhaeoNet merged module repartition and chloroplast targeting sequences were predicted by HECTAR or ASAFind (Gruber et al., 2015).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.