Single-cell DNA sequencing datasets

Furui Liu; Fangyuan Shi; Zhenhua Yu

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Single-cell DNA sequencing datasets

FL Furui Liu

FS Fangyuan Shi

ZY Zhenhua Yu

This method is extracted from research article: BMC Genomics, Jan 2024

Inferring single-cell copy number profiles through cross-cell segmentation of read counts

DOI: 10.1186/s12864-023-09901-5

Request a Protocol

Ask a question

Favorite

To fully assess the effectiveness of DeepCNA, we generate various datasets by emulating different cell ploidy. The simulation pipeline is implemented by following our previous study [19]. Specifically, the simulation consists of three steps: 1) construct a clonal tree following the approach adopted in [24]; 2) generate genome sequence for each cell based on simulated CNAs; and 3) produce reads of each cell given biological and technological parameters. When generating the clonal tree, we assume the ancestral cell state is homogeneous with the defined ploidy and then all clones evolve from that state. Nodes of the clonal tree represent tumor clones and edges are labeled with CNAs. The size of simulated CNAs ranges between 3 and 20 Mb, the number of clones is set to 4, and the number of cells is set to 100. Given the CNAs of each cell, SCSsim [25] tool is employed to generate reads under sequencing coverage of 0.02. Read alignments are obtained using BWA [26] tool under default parameters, and further processed with SAMtools [27] to generate BAM files. We generate diploid, triploid and tetraploid datasets to examine the ability of DeepCNA in distinguishing between different tumor ploidy. For each tumor ploidy, the simulation is repeated 10 times, resulting in 30 datasets for benchmarking.

Two real datasets are employed in this study. The first dataset consists of 100 single cells from a breast ductal carcinoma patient [28], and sequencing data can be downloaded from NCBI SRA under accession number SRA018951. The second dataset is a 10X Genomics dataset containing 2053 cells from a triple negative ductal carcinoma [16], and sequencing data are freely available at https://support.10xgenomics.com/single-cell-dna/datasets/1.0.0/breast_tissue_E_2k.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol