CNV calls

MK Masataka Kikuchi
KK Kaori Kobayashi
NN Nao Nishida
HS Hiromi Sawai
MS Masaya Sugiyama
MM Masashi Mizokami
KT Katsushi Tokunaga
AN Akihiro Nakaya
ask Ask a question
Favorite

CNVs from the samples were called using PennCNV15. PennCNV uses the log R ratio (LRR) value and the B allele frequency (BAF) for each SNP to infer the copy number states of each SNP. LRR indicates a normalized measure of the total signal intensity of the B and A alleles and directly reflects an increase or decrease in the copy number. The BAF shows a normalized measure of the relative signal intensity ratio of the B and A alleles and helps differentiate copy number states (e.g., differentiate copy-neutral loss of heterozygosity regions and normal state regions). PennCNV calculates the probability of observing a particular copy number state by the hidden Markov model (HMM), given the LRR and BAF for each SNP. A population frequency of the B allele (PFB) file and a GC model file were generated from 1831 healthy controls using compile_pfb.pl and cal_gc_snp.pl in PennCNV. An HMM file was provided by Thermo Fisher Scientific, Inc. Only samples with a standard deviation of the log R ratio with a normalized intensity <0.35, B allele frequency drifting value <0.01, and wave factor value between −0.05 and 0.05 were analyzed. Adjacent CNVs separated by a gap of <20% of the combined length of the two CNVs were merged until no more gaps of <20% existed, and CNVs based on fewer than 5 markers were excluded. In this process, we examined four cutoffs in terms of the number of markers included in a CNV, which were >5, >10, >15, and >20 markers. Several genomic regions are known to harbor spurious CNV calls. We excluded centromeric regions, telomeric regions, segmental duplication regions, immunoglobulin regions, and repeat-masked regions. These regions were provided by PennCNV (http://penncnv.openbioinformatics.org/en/latest/misc/faq/). The immunoglobulin regions included four regions (chr2:88937989–89411302, chr14:21159897–22090937, chr14:105065301–106352275, and chr22:20715572–21595082). These regions were transformed from the reference genome hg18 to hg19 using the UCSC LiftOver tool (http://genome.ucsc.edu/cgi-bin/hgLiftOver). We also excluded T-cell receptor (TCR) and immunoglobulin heavy (IGH) chain genomic regions from our analyses because these regions undergo V-(D)-J recombination in lymphocytes and can yield somatic CNVs rather than germline CNVs16. These regions included TCR alpha and delta on chromosome 14 (chr14:22090057–23021075 and chr14:22891537–22935569, respectively), beta and gamma on chromosome 7 (chr7:141998851–142510972 and chr7:38279625–38407656, respectively), and IGH regions on chromosomes 14 and 16 (chr14:106032614–107288051 and chr16:33740716–33741266). Individuals of unknown sex were eliminated. After CNV calls, we finally identified 1830 healthy controls and 1031 HBV patients. Only autosomes were analyzed. PennCNV classifies CNV events according to six state definitions: state 1 = deletion of two copies (copy number: 0), state 2 = deletion of one copy (copy number: 1), state 3 = two-copy state (copy number: 2), state 4 = two-copy state with loss of heterozygosity (copy number: 2), state 5 = duplication of one copy (copy number: 3), and state 6 = duplication of two copies (copy number: 4). A copy number of two was considered normal. CNVs with a copy number >2 were defined as duplications, while those with a copy number <2 were considered deletions. The distribution of CNVs was drawn by the R package “RIdeogram”17.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A