request Request a Protocol
ask Ask a question
Favorite

Let [R] denote the matrix of raw and unordered expressions data that serves as input. For each tissue, CIRCUST is sequenced as follows. Fig B in S1 Text shows an outline of the methodology.

where [N] is the matrix of preprocessed, normalized (and unordered) expression data. [X] is a preliminary ordered gene expression matrix, and [XkTOP] is the k−th expression matrix with the ordered expression data of the tissue-specific TOP genes, i.e. the highly rhythmical genes of each tissue, k = 1, …, K with K a prefixed integer value (see below). To define these two latter (ordered) matrices the temporal order problem must be addressed. The output of CIRCUST is [MTOP], a matrix that contains robust (Median) of the main FMM parameter estimates computed for the TOP genes in [XkTOP], k = 1, …, K. FMM parameters are meaningfully interpretable and characterize rhythmicity, see S1 Text (Section 3.1). CIRCUST steps are described below.

Genes with zero read counts in more than 30% of samples are discarded [35]. Gene expressions are one by one normalized into [-1, 1] by using a min-max normalization [16]. The preprocessed expression matrix is denoted by [N].

A core information set consisting of the 12 genes: PER1, PER2, PER3, CRY1, CRY2, ARNTL, CLOCK, NR1D1, RORA, DBP, TEF and STAT3. In the following, we refer to them as seed genes. There is no a gold-standard for seed genes selection, though gene expression patterns of this choice, generally display marked circadian signals in most of the mammalian tissues and were also considered as circadian benchmarks in previous works [1, 12, 14, 15, 27]. Particularly, the gene STAT3 is included as it has been identified as rhythmic for several human tissues [36, 37]. The stability of the results regarding seed gene selection has been assessed in S1 Text (see Section 3.4).

The role of CPCA at this point is twofold. CPCA is computed on the sub-matrix of the 12 seed genes from [N]. First, CPCA allows detecting outlier samples following the lines described in S1 Text (see Section 3.3). Outliers samples are deleted from all the genes in [N], and the expression data are normalized again. Second, CPCA provides a solution for the temporal order identification problem (setting starting point and direction), from the sub-matrix of the 12 seed genes from [N], as was detailed above. Then, [N] is ordered with regard to the circular order obtained as the solution of CPCA. We refer to this matrix by [X]. In case the median of the RFMM2 from the seed genes after preliminary order is lower than 0.3, the subsequent analysis may be inaccurate.

Rhythmicity models are used at this stage to predict gene expression patterns. First, the ORI model’s [16] computational efficiency allows discarding potentially non-rhythmic genes, with RORI2<0.5, in [X]. R2 is a rhythmicity model’s goodness of fit measure taking values from 0 to 1; the closer to 1, the higher the rhythmicity. Details are given in S1 Text (see Section 3.5). Then, the tissue-specific TOP rhythmic genes are defined, based on the FMM model predictions, as those which are: i) non-spiked (ω^>0.1); ii) with the highest rhythmicity (RFMM2>0.5); and iii) whose peak phases (t^U) cover all the quarters of the unit circle ([0, 2π)). This definition results from the meaningful interpretation of the FMM parameters: ω, tU, see S1 Text (Section 3.1) and [20] for details. The 12 seed genes are usually among the TOP genes, if not, they are forced to be included. [XTOP] denotes the sub-matrix of TOP genes once they are filtered from [X].

Next, random selections of size 2/3 of the genes in the TOP are considered. CPCA solution for temporal order estimates is recomputed for each of these sub-matrices resulting from filtering the selected genes of [XTOP]. The process is repeated until obtaining a prefixed number of K random gene collections verifying that: (a) angular values in θ are distributed along with more than half of the unit circle; (b) and the maximum distance between two consecutive angular values in θ, does not exceed the observed distances for any pair of consecutive angular values with regard to the preliminary order given by the vector θ considered in step. Conditions (a) and (b) pursue robustness on peak’s estimations and avoid spurious gaps not detected from the seed genes, respectively. Both improve the quality of the orders.

Hence, ok, k = 1, …, K circular orders are defined. For each of them, [XTOP] is reordered, obtaining [XkTOP], that denotes the k-th matrix of TOP genes ordered by ok, k = 1, …, K.

FMM predictions for the TOP genes in [XkTOP], k = 1, …, K, are computed. For each gene at the TOP, there are K FMM parameter estimates, and K rhythmicity measures (RFMM2). Robust FMM parameter estimates, in terms of the medians, are computed. [MTOP] is the matrix that contains for the genes in the TOP the median of the FMM features: R2, tU and ω which are key to assess and compare rhythmicity across tissues.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A