发布: 2019年01月05日第9卷第1期 DOI: 10.21769/BioProtoc.3129 浏览次数: 5624
评审: Prashanth N SuravajhalaTiratha Raj SinghAmbily Sivadas
Abstract
Recent advances in genomics present new opportunities for enhancing knowledge about gene regulation and function across a wide spectrum of organisms and species. Understanding and evaluating this information at the individual gene level is challenging, and not only requires extracting, collating and interpreting data from public genetic repositories, but also recognizing that much of the information has been developed through implementation of computationally based exon-calling algorithms, and thus may be inaccurate. Moreover, as these data usually have not been validated experimentally, results also may be incomplete and incorrect. This has created a quality-control problem for scientists who want to use individual gene-specific information in their research. Here, I describe a simple experimental strategy that takes advantage of the large amounts of untapped primary experimental data for characterizing gene expression that have been deposited in the Sequence Read Archive of the National Center for Biotechnology Information. The approach consists of a readily adaptable pipeline that may be used to confirm exons, to define 5’ and 3’ un-translated regions and the beginnings and ends of individual genes, and to quantify alternative RNA splicing. The series of experimental strategies described offers effective replacements for older molecular biological methods, and can rapidly and reproducibly resolve major gene mapping problems.
Keywords: Gene structure (基因结构)Background
Much of the information in genome browsers regarding the structure of individual genes in the genomes of different organisms has been developed through strategies involving the implementation of exon-calling algorithms, coupled with mapping by homology with genes from other species. In general, this information has not been validated with experimental data, with the result often being that it is incomplete, inaccurate, or incorrect (see Figure 3). This has led to a quality-control problem for scientists who want to use this gene-specific information in their research. To address this issue, I have developed a simple experimental strategy that takes advantage of the large amounts of untapped gene expression data that have been deposited in the Sequence Read Archive of the National Center for Biotechnology Information (SRA NCBI), a searchable public resource that as of November 20, 2018 contains 8,548,792,923,294,171 nucleotides of open-access information from many different species of animals and plants. These data have been obtained from investigators who have used a variety of ‘next-generation’ DNA sequencing platforms to individually generate computer files containing tens of millions of base pairs of ‘RNA-sequencing’ results from a wide range of organisms, organs and tissues, developmental stages, and experimental paradigms. Here, I have taken advantage of the easy access to these data to describe a computational-based approach for mapping the 5’ and 3’ ends of genes, and for quantifying alternative RNA splicing. This series of experimental strategies offers effective replacements for older molecular biological methods, including combinations of cDNA cloning and PCR-generated approaches such as 5’ and 3’ RACE [rapid amplification of cDNA ends (Frohman et al., 1988)], and other more traditional assays [e.g., S1-nuclease and ribonuclease-protection mapping (Zinn et al., 1983)], and can be performed rapidly and reproducibly to help resolve the gene mapping problems noted above.
Equipment
Software
On-line databases and accompanying software:
Genomes and individual genes may be identified using the Ensembl Genome Browser (www.ensembl.org) and the UCSC Genome Browser (https://genome.ucsc.edu). For this study, the Igf1 gene of the frog, Xenopus tropicalis, was examined using genome assembly JGI 4.2, as was the IGF1 gene of the chimpanzee, Pan troglodytes, using genome assembly Pan_tro_3.0. RNA-sequencing information was extracted from the Sequence Read Archive of the National Center for Biotechnology Information (SRA NCBI; www.ncbi.nlm.nih.gov/sra) by querying the following datasets with specific 60-nucleotide DNA fragments: Xenopus tropicalis (liver; SRR5412275), Pan troglodytes (liver, SRR4444973; kidney, SRR1758922; skeletal muscle, SRR1758929; and heart, SRR6706810). Searches were performed using the megablast option (optimized for highly similar sequences; maximum target sequences–500 (may be set from 50 to 20,000); expect threshold–10; word size–11; match/mismatch scores–2, -3; gap costs–existence 5, extension 2; low-complexity regions filtered; see: https://blast.ncbi.nlm.nih.gov/Blast.cgi; also see screen shots in Figure 2).
Procedure
文章信息
版权信息
© 2019 The Authors; exclusive licensee Bio-protocol LLC.
如何引用
Readers should cite both the Bio-protocol article and the original research article where this protocol was used:
分类
分子生物学 > RNA > RNA 测序
您对这篇实验方法有问题吗?
在此处发布您的问题,我们将邀请本文作者来回答。同时,我们会将您的问题发布到Bio-protocol Exchange,以便寻求社区成员的帮助。
提问指南
+ 问题描述
写下详细的问题描述,包括所有有助于他人回答您问题的信息(例如实验过程、条件和相关图像等)。
Share
Bluesky
X
Copy link