TCRα sequencing and analysis

PM Philippa Marrack
SK Sai Harsha Krovi
DS Daniel Silberman
JW Janice White
EK Eleanor Kushnir
MN Maki Nakayama
JC James Crooks
TD Thomas Danhorn
SL Sonia Leach
RA Randy Anselment
JS James Scott-Browne
LG Laurent Gapin
JK John Kappler
request Request a Protocol
ask Ask a question
Favorite

RNA was isolated from purified naïve CD4 T cells, PCR’d to expand Tcra sequences and sequenced as described in Silberman et al. (2016). Post-sequencing analysis was performed to identify the Trav and Traj genes for each sequence along with its corresponding CDR3. Trav family and subfamily members were assigned based on the IMGT designations with modifications based on our own analysis of expressed TRAV sequences in B6 mice. IMGT has identified two gene duplication events in the B6 Trav locus, the ‘original’ genes, most of which are closest to the TRAJ locus are designated by their family number and a number indicating their subfamily membership. Here, for ease of analysis, we have added the letter ‘A’ to their designation, eg TRAV01-1A. TRAV subfamily members in the IMGT designated duplicated ‘D’ and new ‘N’ genes we add the letters ‘D’ or ‘N’, eg TRAV07-6D or TRAV07-6N. In some cases the entire nucleotide sequences of subfamily members are identical and, therefore, indistinguishable by our analyses. In these cases the subfamily members are designated to include all possible source genes, eg TRAV06-3ADN or TRAV06-6AD.

Errors occur during sequencing reactions and accumulate as the numbers of sequences acquired increase (Bolotin et al., 2012; Liu et al., 2014). The sequences were all corrected for errors in the Trav and Traj elements, which do not somatically mutate. However, because the amino acids in and flanking the non germ line encoded portions of CDR3 regions could not be corrected, sequences with errors in these elements are bound to appear at some low frequency and cause a gradual rise in the species accumulation curves. To eliminate these misreads we decided to include in our analyses only those TCRαsequences that occurred more than once in each sample. To correct for sequencing errors within the CDR3, the sequences were modified by replacing erroneous nucleotides with the appropriate germline-encoded nucleotides whenever a discrepancy was observed. Such correction was possible only when a nucleotide difference could be resolved by aligning to the germline Trav and/or Traj genes. To avoid making inappropriate changes to the potentially non germline encoded portions of CDR3α, such corrections were applied only if the change from the germline sequence occurred more than three nucleotides before the predicted end of the Trav genes or more than three nucleotides after the predicted end of the Traj gene. Finally, the amino acid usage within the CDR3α was determined for each sequence to identify any patterns in the CDR3 regions in sequences belonging to T cells from one MHC haplotype versus another. All of the analysis was performed using in-house programs developed in Python 2.7. Software and sequences used to analyze and correct TCR alpha sequences are at the lab webpage https://www.nationaljewish.org/research-science/programs-depts/biomedical-research/labs/kappler-marrack-research-lab/protocols or available on request to PM, SHK or JWK. The raw and analyzed sequences used in this paper are at GEO accession GSE105129.

In order to represent the differential Trav and Traj gene usage in TCRs sequenced from different mouse samples, we used edgeR from the R/Bioconductor package. A threshold of p<0.05 was used to identify genes that were most significantly differentially expressed between samples.

Euclidean distances for TRAVs and TRAJs were calculated as log2 transformed counts per 104 sequences.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A