For a given RNA sequence, ParasoR exactly computes various expected values from the Boltzmann ensemble of secondary structures under a maximal pair-distance constraint. It avoids numerical errors by dealing with only the ratios of DP variables, which do not change in magnitude as the sequence length N changes. To allow distributed computing, ParasoR divides the DP matrices into smaller pieces without losing their mutual dependencies. The computational complexities are given by either
𝒪(NW2/K + NW) time, 𝒪(N/K + W2) memory for each node, and 𝒪(NW) disk space or
𝒪(NW2/K + KW2) time, 𝒪(N/K + W2) memory for each node, and 𝒪(N + KW2) disk space, which requires less disk space than (i) but twice the computational time 𝒪(NW2/K) for DP matrices construction.
Here, N denotes the input sequence length; W denotes the maximal span of base pairs; and K denotes the number of available computer nodes. We first analyze the dependency structures of DP variables on N and then rewrite the expected values using the ratios of DP variables, which do not change scales with N. Next, we describe how the computation of these variables is distributed across different computer nodes. For brevity, the explicit algorithms are described in Additional file 1: Chapter 1.
Any secondary structure ζ of sequence x is specified by a list of base pairs. In the conventional Turner energy model, a base pair of xk and xl must be one of the canonical base pairs {AU, UA, CG, GC, GU, UG}, and the distance between them should satisfy 5≤(l−k+1). We designate a position pair (i,j) an outermost pair if (xi+1,xj) forms a base pair and there is no base pair that encloses (i,j) in ζ. Since we impose the maximal span constraint, the outermost pair (i,j) also satisfies (j−i)≤W. Then, the structure ζ is uniquely decomposed into the set of non-overlapping substructures that are enclosed by an outermost pair for each and fragments of exterior loops between or flanking them. We define the set of potential outermost pairs of x as P={(i,j) | (xi+1,xj) is one of the canonical base pairs and 5≤j−i≤W}.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.