Following the methods of Tian et al.26, we applied the Python package SCANPY48 (version 1.4.4) to preprocess the raw scRNA-seq read count data. Firstly, we filter out genes with no count in any cell. Secondly, we calculate the size factors for each cell and normalize the read counts by the library size, such that the total counts are the same across cells. Formally, let’s denote the library size (i.e., the number of total read counts) of cell i as si; the size factor of cell i is then si/median(s). Finally, we take the log transformation and scale the read counts to have unit variance and zero mean. The transformed read count matrix is used as the input for our denoising ZINB model-based autoencoder. When calculating the ZINB loss, we use the raw count matrix20,22,26.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.