Structural Alignment and Covariation Analysis of RNA Sequences

[Abstract] RNA molecules adopt defined structural conformations that are essential to exert their function. During the course of evolution, the structure of a given RNA can be maintained via compensatory base-pair changes that occur among covarying nucleotides in paired regions. Therefore, for comparative, structural, and evolutionary studies of RNA molecules, numerous computational tools have been developed to incorporate structural information into sequence alignments and a number of tools have been developed to study covariation. The bioinformatic protocol presented here explains how to use some of these tools to generate a secondary-structure-aware multiple alignment of RNA sequences and to annotate the alignment to examine the conservation and covariation of structural elements among the sequences.

accuracies. The running time of TurboFold is considerably longer than that of MAFFT (Katoh and Toh, 2008;Tan et al., 2017), thus in this protocol we use MAFFT because its speed, which can be further augmented by parallel processing (Katoh and Standley, 2013), allows alignment of a large number (>100) of sequences in a limited amount time. Like several of the other tools MAFFT employs an iterative strategy where pairwise structural alignments are first computed and are then progressively combined into a multiple alignment through several rounds of refinement.
Because of the tight structure-function relationship, functional RNAs undergo a selection pressure to maintain their structures (Nowick et al., 2019). This is reflected by the occurrence of covarying consistent or compensatory mutations in paired nucleotides that can be observed in sequence alignments.
Covariation data are therefore very valuable and have been used to validate or predict the secondary, and even tertiary, structure of RNAs and to understand their evolution (Michel and Westhof, 1990; Cannone et al., 2002). A number of software tools are available for examining covariation within alignments, such as the structural alignment editors RALEE (Griffiths-Jones, 2005), 4SALE ( (Lai et al., 2012), a tool that scores and annotates covariation, and complex programs that include methods for performing statistical analysis of covariation with or without a phylogenetic framework such as R-scape (Rivas et al., 2017) and CoMap (Dutheil, 2012). R-chie highlights basepairs and employs arc diagrams to represent the secondary structure alongside the alignment, and can generate highly customizable figures.
In the protocol below, we explain how to use MAFFT to compute a structural alignment of multiple RNA sequences, and how to use R-chie to annotate the alignment with conservation and covariation information. All collected sequences must be put in a single file in the commonly used FASTA format (https://en.wikipedia.org/wiki/FASTA_format; Figure 1).

mafft_alignment_details.log
The alignment generated will be in FASTA format (Figure 2).

(Optional) Predict a reference secondary structure:
In order to reveal covariation, a reference secondary structure is needed. It can be the structure of one of the RNA sequences included in the analysis, a consensus structure inferred from the alignment, or an external structural model. In lack of a known or experimentally-determined model, the structure needs to be predicted. Prediction of the 2D structure of a single RNA sequence can be done with widely used tools such as MFOLD (Zuker, 2003 c. R-chie is highly customizable. Options "--legend1", "--legend3", "--msaspecies", "--msagrid", and "--msatext" can be invoked or omitted at will in order to turn on or off legends, sequence names, and other graphical settings. Colors for every element in the image can be specified 7 www.bio-protocol.org/e3511 by additional options such as "--colour1", "--palette1", or "--msacol"; run "rchie.R --help" for details.

Data analysis
The protocol described here has been used to examine the covariation within metastable regions in a structural alignment of 107 mRNAs of type I toxin-antitoxin systems from bacteria of the genus

Notes
In this protocol, we employ MAFFT to compute structural alignments, but the procedure may be performed similarly using any other tools that produce structure-aware sequence alignments.