The ProjecTILs pipeline

Massimo Andreatta; Jesus Corria-Osorio; Sören Müller; Rafael Cubas; George Coukos; Santiago J. Carmona

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

The ProjecTILs pipeline

MA Massimo Andreatta

JC Jesus Corria-Osorio

SM Sören Müller

RC Rafael Cubas

GC George Coukos

SC Santiago J. Carmona

This method is extracted from research article: Nat Commun, May 2021

Interpretation of T cell states from single-cell transcriptomics data using reference atlases

DOI: 10.1038/s41467-021-23324-4

Request a Protocol

Ask a question

Favorite

The essential input to the ProjecTILs pipeline is an expression matrix, where genes are rows and cells are columns. If raw counts (e.g., UMI counts) are provided, each entry x in the matrix will be normalized using the formula: log (1 + 10,000 x / S), where S is the sum of all counts for that cell, and log is the natural logarithm. To ensure that only T cells are included in the query dataset, by default TILPRED-1.0 is applied to predict the composition of the query, and all cells annotated as “Non-T cells” or “unknown” are removed from the query. This filter can be optionally disabled by the user. Then, a reference atlas of annotated cells states (by default the TIL atlas) is loaded into memory, together with its cell embeddings in gene, PCA and UMAP spaces, and all associated metadata. In order to bring the query data in the same representation spaces as the reference map, batch-effect correction is applied to the normalized cell-gene counts of the query set using the anchor-finding and integration algorithms implemented in STACAS and Seurat, where the genes for integration consist of the intersection of the variable genes of the reference map and all genes from the query. After batch-effect correction, the PCA rotation matrix pre-calculated on the reference atlas (i.e., the coefficients allowing the transformation from reference gene space into PCA space) is applied to the normalized, batch-corrected query matrix. In the same way, the predict function of the “umap” package allows transforming PCA embeddings into UMAP coordinates. By this means, the query data can be embedded into the original, unaltered coordinate spaces of the reference atlas, enabling joint visualization as well as classification of the query cells into T cell subtypes.

ProjecTILs is implemented as a modular R package, with several functions that aid interpretation and analysis. The make.projection function is the core utility that implements the projection algorithm described above. It can be run in “direct” mode, in which case the PCA and UMAP rotations are directly applied without batch-effect correction. This may be useful for very small datasets, where alignment and integration algorithms will not be applicable. To project human data onto a murine reference atlas, the user must set the flag “human.ortho = TRUE”, which automatically converts human genes to their mouse orthologs before projection. Plot.projection allows visualizing the query dataset as density level curves superimposed on the reference atlas. The cellstate.predict function implements a nearest-neighbor classifier, which predicts the state of each query cell by a majority vote of its annotated nearest neighbors (either in PCA or UMAP space) in the reference map. Find.discriminant.genes performs differential expression analysis for specific cell states/subtypes between two paired conditions, or alternatively between one condition and the reference map. Find.discriminant.dimensions analyses PCA and ICA embeddings (described below) to identify dimensions where the query deviates significantly from the reference map. Several additional functions allow visualizing multiple aspects of the reference and projected dataset and aid the biological interpretation of the results. The code and description of the package, together with tutorials and applications to analyze public datasets can be found at: https://github.com/carmonalab/ProjecTILs.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol