In order to evaluate the phasing result on highly polymorphic MHC region (28.5 M to 33.5 M on Chromosome 6 in the GRCh38 human genome assembly), a comparison analysis on nine relatively well-characterized HLA genes, HLA-A, HLA-C, HLA-B, HLA-DRA, HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DPA1, and HLA-DPB1, were performed for the NA12878 sample. Six digits allele type of these genes were known for the NA12878 sample. In order to get the exact nucleotide sequence for comparison with TELL-seq data, we assigned 01 as the seventh and eighth digit to these genes (Supplemental Table S5) and used the corresponding sequences on these allele types as the “reference data.” Briefly, the haplotype allele sequences in FASTA format for these nine genes in the NA12878 sample were downloaded and extracted from ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/fasta/ based on the allele type in the Supplemental Table S5. These sequences were aligned to human Chromosome 6 from the GRCh38 assembly by the standalone BLASTN program https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/. The blast result was parsed so that the alignment pieces, which were most likely the origins to hub the alleles, have been determined. Those ambiguous alignments, either weak identities, short alignments, or in different locations and/or strand, were ignored for the process to avoid collecting too much noise for the downstream analyses. Paternal and maternal alleles from NA12878 were mapped to each other using GRCh38 coordinates as the references. This mapping result was used to extract out the heterozygous SNVs in NA12878, which then served as the reference for our comparison process. The heterozygous SNVs detected in TELL-seq from the phasing analysis were compared to these references. The recall and precision rates were calculated based on how many SNVs were overlapped between TELL-seq data and reference data. TELL-seq phasing analysis did not provide any paternal and maternal information on the phased haplotype. During the comparison, the parental origin of each haplotype from TELL-seq was determined based on the parental information from the reference alleles.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.