Also in the Article

Benchmark
This protocol is extracted from research article:
Conserved long-range base pairings are associated with pre-mRNA processing of human genes
Nat Commun, Apr 16, 2021;

Procedure

We compared the accuracy and runtime of PrePH to those of other programs such as IntaRNA2.0121, RIsearch2122, RNAplex123, DuplexFold124, and bifold124. PrePH was run with the following parameters: k-mer length is 5 nts, maximal distance between complementary regions is 10,000 nts, the minimal length of the aligned regions is 10 nts, the energy threshold is −15 kcal/mol, and the maximal number of Wobble pairs in a k-mer is 2. The parameters for the other programs are listed below.

To benchmark the time efficiency, we use a set of 1000 pairs of randomly chosen conserved intronic sequences from the human genome. The sequences were 50 to 500 nts long and contained nearly perfect sequence complementarity. All the programs were run with the energy threshold set to −15 kcal/mol. $IntaRNA2.0$ outOverlap parameter was set to B, which allowed overlap for interacting subsequences for both target and query; n parameter was set to 100 to limit the maximal number of suboptimal structures; qAcc and tAcc were set to N to omit the computation of accessibility. $RIsearch2$ seed length was set to 5, the length of flanking sequences considered for seed extension was set to 50. $RNAplex$ fast-folding parameter was set to f2 to allow the structure to be computed based on the approximated model. $DuplexFold$ and $bifold$ maximum loop/bulge size was set to two. All other parameters were left at their default values. The computations were carried out on Intel R Core TM i5-8250U CPU with 1.60 GHz. PrePH showed the quickest result compared to the other programs (191.4 s) (Table S2A). At that, the equilibrium free energies of the predictions by PrePH correlated reasonably well with those of RNAplex, IntaRNA, and Duplexfold (Fig. S13).

For the comparison of MFE between different methods, we used simulated data with 1000 pairs of nearly perfect sequence complementarity, which were from 10 to 50 nts. All the programs were run with the energy threshold set to −15 kcal/mol. The other parameters were as before. Pearson’s correlation coefficients were computed between energies of the predicted optimal structures. To compare the predictions of PrePH with predictions of other programs at the level of individual base pairs, we computed the following metric

where S1 is the set of base pairs predicted by PrePH, S2 is the set of base pairs predicted by the other program, and S1 ∩ S2 is the common set of base pairs (∣S∣ denotes the cardinality of a set). The number of base pairs that were common between PrePH and each of the other programs as a fraction of the number of base pairs predicted by PrePH alone was used as a measure of specificity (Table S2B). PrePH showed specificity >80% with respect to all other programs except bifold; however, the latter was not in agreement with all other programs.

We conclude that PrePH allows for computationally efficient detection of PCCRs without significant loss of accuracy compared to other methods. The computation time of PrePH on the complete dataset of CIRs was 4 h (15 threads, 1200 MHz CPUs each).

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A