通过RNA测序进行基因定位：一种通过大型公共数据库目标查询来鉴定基因和基因表达的直接方法

Peter Rotwein

doi:10.21769/BioProtoc.3129

Improve Research Reproducibility A Bio-protocol resource

提交稿件
订阅
CN
- EN - English
- CN - 中文

Peer-reviewed

Gene Mapping by RNA-sequencing: A Direct Way to Characterize Genes and Gene Expression through Targeted Queries of Large Public Databases

通过RNA测序进行基因定位：一种通过大型公共数据库目标查询来鉴定基因和基因表达的直接方法

PR Peter Rotwein email

发布: 2019年01月05日第9卷第1期 DOI: 10.21769/BioProtoc.3129 浏览次数: 5624

评审: Prashanth N SuravajhalaTiratha Raj SinghAmbily Sivadas

下载 PDF

Q&A

引用

Cited by

参见作者原研究论文

The authors used this protocol in:

Cover of The Journal of Biological Chemistry, featuring study using the protocol.

Oct 2018

实验方案合集

Cell Imaging - A Special Collection for Cell Bio 2023

相关实验方案

用于保存肌腱成纤维细胞转录组的低温活性蛋白酶组织解离方法

Arul Subramanian [...] Thomas F. Schilling

2025年05月05日 1409 阅读

RACE-Nano-Seq：解析基因组特定位点的转录组多样性

Lu Tang [...] Philipp Kapranov

2025年07月05日 1337 阅读

基于RNA邻位标记的OINC-seq方法分析RNA定位

Megan C. Pockalny [...] J. Matthew Taliaferro

2025年08月05日 698 阅读

Abstract

Recent advances in genomics present new opportunities for enhancing knowledge about gene regulation and function across a wide spectrum of organisms and species. Understanding and evaluating this information at the individual gene level is challenging, and not only requires extracting, collating and interpreting data from public genetic repositories, but also recognizing that much of the information has been developed through implementation of computationally based exon-calling algorithms, and thus may be inaccurate. Moreover, as these data usually have not been validated experimentally, results also may be incomplete and incorrect. This has created a quality-control problem for scientists who want to use individual gene-specific information in their research. Here, I describe a simple experimental strategy that takes advantage of the large amounts of untapped primary experimental data for characterizing gene expression that have been deposited in the Sequence Read Archive of the National Center for Biotechnology Information. The approach consists of a readily adaptable pipeline that may be used to confirm exons, to define 5’ and 3’ un-translated regions and the beginnings and ends of individual genes, and to quantify alternative RNA splicing. The series of experimental strategies described offers effective replacements for older molecular biological methods, and can rapidly and reproducibly resolve major gene mapping problems.

Keywords: Gene structure (基因结构)

Gene expression (基因表达)

Genomics (基因组学)

Genetic databases (遗传数据库)

Gene annotation (基因注释)

Gene mapping (基因定位)

Gene characterization (基因特征)

RNA-sequencing (RNA测序)

Bio-informatics (生物信息学)

Background

Much of the information in genome browsers regarding the structure of individual genes in the genomes of different organisms has been developed through strategies involving the implementation of exon-calling algorithms, coupled with mapping by homology with genes from other species. In general, this information has not been validated with experimental data, with the result often being that it is incomplete, inaccurate, or incorrect (see Figure 3). This has led to a quality-control problem for scientists who want to use this gene-specific information in their research. To address this issue, I have developed a simple experimental strategy that takes advantage of the large amounts of untapped gene expression data that have been deposited in the Sequence Read Archive of the National Center for Biotechnology Information (SRA NCBI), a searchable public resource that as of November 20, 2018 contains 8,548,792,923,294,171 nucleotides of open-access information from many different species of animals and plants. These data have been obtained from investigators who have used a variety of ‘next-generation’ DNA sequencing platforms to individually generate computer files containing tens of millions of base pairs of ‘RNA-sequencing’ results from a wide range of organisms, organs and tissues, developmental stages, and experimental paradigms. Here, I have taken advantage of the easy access to these data to describe a computational-based approach for mapping the 5’ and 3’ ends of genes, and for quantifying alternative RNA splicing. This series of experimental strategies offers effective replacements for older molecular biological methods, including combinations of cDNA cloning and PCR-generated approaches such as 5’ and 3’ RACE [rapid amplification of cDNA ends (Frohman et al., 1988)], and other more traditional assays [e.g., S1-nuclease and ribonuclease-protection mapping (Zinn et al., 1983)], and can be performed rapidly and reproducibly to help resolve the gene mapping problems noted above.

Equipment

Internet-connected computer
An internet-connected computer is needed to access the online resources listed in the Software section. No other specialized computer hardware or software is needed, as all of the programs will run within the online computer servers.

Software

On-line databases and accompanying software:
Genomes and individual genes may be identified using the Ensembl Genome Browser (www.ensembl.org) and the UCSC Genome Browser (https://genome.ucsc.edu). For this study, the Igf1 gene of the frog, Xenopus tropicalis, was examined using genome assembly JGI 4.2, as was the IGF1 gene of the chimpanzee, Pan troglodytes, using genome assembly Pan_tro_3.0. RNA-sequencing information was extracted from the Sequence Read Archive of the National Center for Biotechnology Information (SRA NCBI; www.ncbi.nlm.nih.gov/sra) by querying the following datasets with specific 60-nucleotide DNA fragments: Xenopus tropicalis (liver; SRR5412275), Pan troglodytes (liver, SRR4444973; kidney, SRR1758922; skeletal muscle, SRR1758929; and heart, SRR6706810). Searches were performed using the megablast option (optimized for highly similar sequences; maximum target sequences–500 (may be set from 50 to 20,000); expect threshold–10; word size–11; match/mismatch scores–2, -3; gap costs–existence 5, extension 2; low-complexity regions filtered; see: https://blast.ncbi.nlm.nih.gov/Blast.cgi; also see screen shots in Figure 2).

Procedure

English

中文翻译

文章信息

版权信息

如何引用

Readers should cite both the Bio-protocol article and the original research article where this protocol was used: