LDA from the topicmodels R package (v0.2-6) was used to generate probabilistic representations of cell clusters and gene sets present in the dataset, referred to as Cell-States and Gene-States. The input for the Cell-State model required a counts data matrix where cells were columns and genes were rows, whereas for the Gene-State model, the same matrix was transposed (i.e., genes were columns and cells were rows). Models were fit using the variational expectation–maximization (VEM) algorithm with the following parameters: nstart = 5, seed = 12345, estimate.alpha = TRUE, estimate.beta = TRUE. The given parameter k determined the number of Cell-States and Gene-States to be estimated by the model. The optimal value of k was determined by fivefold cross-validation and evaluation of model perplexity. For the Gene-State model, cells were randomly partitioned into “training” (80%) and “test” (20%) sets, whereas for the Cell-State model, genes were randomly partitioned into training (80%) and test (20%) sets. Models were then fit to the training set, and perplexity was estimated to evaluate model fit for the held-out test set. Fifty iterations of this process were performed for k = 2 to 50, mean perplexity was calculated at each k, and the minimum mean perplexity was selected as the optimal value of k (i.e., k.opt), which was k = 13 for the Cell-State model and k = 19 for the Gene-State model (fig. S6).

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.