Copy number variation

Susanne U Franssen; Caroline Durrant; Olivia Stark; Bettina Moser; Tim Downing; Hideo Imamura; Jean-Claude Dujardin; Mandy J Sanders; Isabel Mauricio; Michael A Miles; Lionel F Schnur; Charles L Jaffe; Abdelmajeed Nasereddin; Henk Schallig; Matthew Yeo; Tapan Bhattacharyya; Mohammad Z Alam; Matthew Berriman; Thierry Wirth; Gabriele Schönian; James A Cotton

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Copy number variation

SF Susanne U Franssen

CD Caroline Durrant

OS Olivia Stark

BM Bettina Moser

TD Tim Downing

HI Hideo Imamura

JD Jean-Claude Dujardin

MS Mandy J Sanders

IM Isabel Mauricio

MM Michael A Miles

LS Lionel F Schnur

CJ Charles L Jaffe

AN Abdelmajeed Nasereddin

HS Henk Schallig

MY Matthew Yeo

TB Tapan Bhattacharyya

MA Mohammad Z Alam

MB Matthew Berriman

TW Thierry Wirth

GS Gabriele Schönian

JC James A Cotton

This method is extracted from research article: eLife, Mar 2020

Global genome diversity of the Leishmania donovani complex

DOI: 10.7554/eLife.51243

Request a Protocol

Ask a question

Favorite

To identify large copy number variants (CNVs), realigned bam files for each sample were filtered for proper-pairs and PCR or optical duplicates were removed using samtools view (RRID:SCR_002105, v1.3, Li et al., 2009). Coverage was then determined using bedtools genomecov (RRID:SCR_006646, v2.17.0) with parameters: ‘-d -split’ (Quinlan and Hall, 2010). Large duplications and deletion were identified using custom scripts in R (R Development Core Team, 2013): genome coverage was determined for 5 kb non-overlapping windows along the genome and each window was normalized by the haploid chromosome coverage of the respective chromosome and sample (i.e. median chromosome coverage divided by somy of the respective chromosome and sample). Large CNVs were identified through stretches of consecutive windows with a somy-normalized median coverage >= 0.5 or<=−0.5 for duplications and deletions, respectively, a minimum length of 25 kb and a median normalized coverage difference across windows >= 0.9 (Supplementary file 6). To identify large CNVs across samples at identical positions and variant type, we grouped CNVs across samples with identical start and end positions within <= 10 kb (i.e. up to two 5 kb windows difference) (Supplementary file 7). CNVs of individual genes were determined based on the filtered bam files (see genome coverages) with bedtools coverage (RRID:SCR_006646, v2.17.0) using parameters ‘-d -split’ (Quinlan and Hall, 2010) and analysing gene coverages in R (R Development Core Team, 2013). The coverage of each gene was approximated by its median coverage and normalized by the haploid coverage of the respective chromosome and sample (Supplementary file 9).

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol