Systems Biology


Protocols in Current Issue
Protocols in Past Issues
0 Q&A 2556 Views Sep 20, 2021

Genome-wide sequencing of RNA (RNA-seq) has become an inexpensive tool to gain key insights into cellular and disease mechanisms. Sample preparation and sequencing are streamlined and allow the acquisition of hundreds of gene expression profiles in a few days; however, in particular, data processing, curation, and analysis involve numerous steps that can be overwhelming to non-experts. Here, the sample preparation, sequencing, and data processing workflow for RNA-seq expression analysis in yeast is described. While this protocol covers only a small portion of the RNA-seq landscape, the principal workflow common to such experiments is described, allowing the reader to adapt the protocol where necessary.

Graphic abstract:

Basic workflow of RNA-seq expression analysis.

0 Q&A 4132 Views Nov 5, 2020
Association mapping is the process of linking phenotypes with genotypes. In genome wide association studies (GWAS), individuals are first genotyped using microarrays or by aligning sequenced reads to reference genomes. However, both these approaches rely on reference genomes which limits their application to organisms with no or incomplete reference genomes. To address this, reference free association mapping methods have been developed. Here we present the protocol of an alignment free method for association studies which is based on counting k-mers in sequenced reads, testing for associations between k-mers and the phenotype of interest, and local assembly of the k-mers of statistical significance. The method can map associations of categorical phenotypes to sequence and structural variations without requiring prior sequencing of reference genomes.
0 Q&A 4797 Views Sep 20, 2020
Gene transcription in bacteria often starts some nucleotides upstream of the start codon. Identifying the specific Transcriptional Start Site (TSS) is essential for genetic manipulation, as in many cases upstream of the start codon there are sequence elements that are involved in gene expression regulation. Taken into account the classical gene structure, we are able to identify two kinds of transcriptional start site: primary and secondary. A primary transcriptional start site is located some nucleotides upstream of the translational start site, while a secondary transcriptional start site is located within the gene encoding sequence.

Here, we present a step by step protocol for genome-wide transcriptional start sites determination by differential RNA-sequencing (dRNA-seq) using the enteric pathogen Shigella flexneri serotype 5a strain M90T as model. However, this method can be employed in any other bacterial species of choice. In the first steps, total RNA is purified from bacterial cultures using the hot phenol method. Ribosomal RNA (rRNA) is specifically depleted via hybridization probes using a commercial kit. A 5′-monophosphate-dependent exonuclease (TEX)-treated RNA library enriched in primary transcripts is then prepared for comparison with a library that has not undergone TEX-treatment, followed by ligation of an RNA linker adaptor of known sequence allowing the determination of TSS with single nucleotide precision. Finally, the RNA is processed for Illumina sequencing library preparation and sequenced as purchased service. TSS are identified by in-house bioinformatic analysis.

Our protocol is cost-effective as it minimizes the use of commercial kits and employs freely available software.
0 Q&A 14952 Views Jun 20, 2018
Plant roots associate with a wide diversity of bacteria and archaea across the root-soil spectrum. The rhizosphere microbiota, the communities of microbes in the soil adjacent to the root, can contain up to 10 billion bacterial cells per gram of soil (Raynaud and Nunan, 2014) and can play important roles for the fitness of the host plant. Subsets of the rhizospheric microbiota can colonize the root surface (rhizoplane) and the root interior (endosphere), forming an intimate relationship with the host plant. Compositional analysis of these communities is important to develop tools in order to manipulate root-associated microbiota for increased crop productivity. Due to the reduced cost and increasing throughput of next-generation sequencing, major advances in deciphering these communities have recently been achieved, mainly through the use of amplicon sequencing of the 16S rRNA gene. Here we first present a protocol for dissecting the microbiota from various root compartments, developed using rice as a model. We next present a method for amplifying fragments of the 16S rRNA gene using a dual index approach. Finally, we present a simple workflow for analyzing the resulting sequencing data to make ecological inferences.
0 Q&A 9152 Views Dec 5, 2017
Next-generation sequencing (NGS) offers unparalleled resolution for untargeted organism detection and characterization. However, the majority of NGS analysis programs require users to be proficient in programming and command-line interfaces. EDGE bioinformatics was developed to offer scientists with little to no bioinformatics expertise a point-and-click platform for analyzing sequencing data in a rapid and reproducible manner. EDGE (Empowering the Development of Genomics Expertise) v1.0 released in January 2017, is an intuitive web-based bioinformatics platform engineered for the analysis of microbial and metagenomic NGS-based data (Li et al., 2017). The EDGE bioinformatics suite combines vetted publicly available tools, and tracks settings to ensure reliable and reproducible analysis workflows. To execute the EDGE workflow, only raw sequencing reads and a project ID are necessary. Users can access in-house data, or run analyses on samples deposited in Sequence Read Archive. Default settings offer a robust first-glance and are often sufficient for novice users. All analyses are modular; users can easily turn workflows on/off, and modify parameters to cater to project needs. Results are compiled and available for download in a PDF-formatted report containing publication quality figures. We caution that interpreting results still requires in-depth scientific understanding, however report visuals are often informative, even to novice users.
0 Q&A 9325 Views May 20, 2017
While the diversity of species represents a diversity of special biological abilities, many of the genes that encode those special abilities in a variety of species are untouched, leaving an untapped gold mine of genetic information; however, despite current advances in genome bioinformatics, annotation of that genetic information is incomplete in most species, except for well-established model organisms, such as human, mouse, or yeast. A guide RNA (gRNA) library using the clustered regularly interspersed palindromic repeats (CRISPR)/Cas9 (CRISPR-associated protein 9) system can be used for the phenotypic screening of uncharacterized genes by forward genetics. The construction of a gRNA library usually requires an abundance of chemically synthesized oligos designed from annotated genes; if one wants to convert mRNA into gRNA without prior knowledge of the target DNA sequences, the major challenges are finding the sequences flanking the protospacer adjacent motif (PAM) and cutting out the 20-bp fragment. Recently, I developed a molecular biology-based technique to convert mRNA into a gRNA library (Arakawa, 2016) (Figure 1). Here I describe the detailed protocol of how to construct a gRNA library from mRNA.

Figure 1. A method to convert mRNA into a gRNA library construction (Sanjana et al., 2014). The scheme of the method is summarized. Each step of D-O is described in detail in the Procedure. Bg, BglII; Xb, XbaI; Bs, BsmBI; Aa, AatII. PCR, polymerase chain reaction; lentiCRISPR v2, lentiCRISPR version 2.
0 Q&A 9460 Views Mar 5, 2017
Herein we describe a detailed protocol for DNA virome analysis of low input human stool samples (Monaco et al., 2016). This protocol is divided into four main steps: 1) stool samples are pulverized to evenly distribute microbial matter; 2) stool is enriched for virus-like particles and DNA is extracted by phenol-chloroform; 3) purified DNA is multiple-strand displacement amplified (MDA) and fragmented; and 4) libraries are constructed and sequenced using Illumina Miseq. Subsequent sequence analysis for viral sequence identification should be sensitive but stringent.
0 Q&A 10308 Views Nov 5, 2016
Sequencing of virus genomes during disease outbreaks can provide valuable information for diagnostics, epidemiology, and evaluation of potential countermeasures. However, particularly in remote areas logistical and technical challenges can be significant. Nanopore sequencing provides an alternative to classical Sanger and next-generation sequencing methods, and was successfully used under outbreak conditions (Hoenen et al., 2016; Quick et al., 2016). Here we describe a protocol used for sequencing of Ebola virus under outbreak conditions using Nanopore technology, which we successfully implemented at the CDC/NIH diagnostic laboratory (de Wit et al., 2016) located at the ELWA-3 Ebola virus Treatment Unit in Monrovia, Liberia, during the recent Ebola virus outbreak in West Africa.
0 Q&A 11295 Views Jul 5, 2016
Relative chromosome dosage, i.e., increases or decreases in the number of copies of specific chromosome regions in one sample versus another, can be determined using aligned read-counts from Illumina sequencing (Henry et al., 2010). The following protocol was used to identify the different classes of aneuploids that result from uniparental genome elimination in Arabidopsis thaliana, including chromosomes that have undergone chromothripsis (Tan et al., 2015). Uniparental genome elimination results in the production of haploid progeny from crosses to specific strains called “haploid inducers” (Ravi et al., 2014). On the other hand, chromothripsis, which was first discovered in cancer genomes, is a phenomenon that results in clustered, highly rearranged chromosomes. In plants, chromothripsis has been observed as a result of genome elimination (Tan et al., 2015). Detecting variation in chromosome dosage has multiple applications beside those linked to genome elimination. For example, a dosage variant population of poplar hybrids was created by gamma-irradiation of pollen grains. Hundreds of dosage lesions, insertions and deletions, were identified using this technique and provide a way to associate loci with the phenotypic consequences observed in this population (Henry et al., 2015).

This method has been successfully used to detect changes in chromosome dosage in many different species, including Arabidopsis thaliana (Tan et al., 2015), Arabidopsis suecica (Ravi et al., 2014), rice (Henry et al., 2010) and poplar (Henry et al., 2015). It is important to note that dosage plots always indicate dosage variation relative to the control sample used (Note 1). Therefore, this approach is not suitable to detect ploidy variants (diploid vs triploid, for example). Similarly, this technique does not allow the detection of balanced chromosomal rearrangements such as reciprocal translocations.
1 Q&A 14572 Views Aug 5, 2015
The construction of a physical collection of open reading frames (ORFeomes) for genes of any model organism is a useful tool for the exploration of gene function, gene regulation, and protein-protein interaction. Here we describe in detail a protocol that has been used to develop the first collection of transcription factor (TF) and co-regulator (CR) open reading frames (TFome) in maize (Burdo et al., 2014). This TFome is being used to establish the architecture of gene regulatory networks (GRNs) responsible for the control of transcription of all genes in an organism. The protocol outlined here describes how to proceed when only an incomplete genome with partial annotation is available. TFome clones are made in a recombination-ready vector of the Gateway? system, allowing for the facile transfer of the ORFs to other Gateway?-compatible vectors, such as those suitable for expression in other host species. Although this protocol was developed for the maize TFome it can readily be applied to the generation of complete ORFeome collections in other eukaryotic species.

[Protocol overview] An important aspect of successful TFome generation is the initial effort spent to establish a reliable set of gene models so that they can be subsequently amplified or synthesized. An actual TFome construction protocol for a particular species will depend on available resources such as a full-length cDNA (flcDNA) collection and a reliable reference genome (Figure 1).

In the case of maize, a flcDNA collection and a draft genome was available, but the former provided only 30% of the needed clones, and the latter contained gaps and some erroneous gene models. In order to develop a near-complete set of target gene models for maize TFs, a bioinformatics pipeline was developed as described by Yilmaz et al. (2009). In brief, a two-pronged search process was developed. The first involved making a collection of protein sequences of TFs in other species and available from databases such as PlantTFDB, PlnTFDB and DBDTF. These sequences were then used to search gene models from the draft maize genome using BLASTP. The second process involved developing a collection of domains that define TF families and that are mostly annotated in the PFAM database (Finn et al., 2014). These domains were then used to search the draft maize genome using BLASTX. The number of TF families that exist and their naming is subject to change as new members are discovered and studied. Table 1 provides a list of known TF families with alternative names along with the respective PFAM domains whose presence or absence defines each TF family. HMM models for each domain can be obtained from the PFAM database ( Following the BLAST search, redundant models are eliminated and then based on the TF motifs present in each gene model, gene models are assigned to a TF or Co-Regulator (CR) family according to the criteria specified in Table 1. Lastly, it is recommended to set up a database to store information on each TF family. The GRASSIUS ( website was established to access the stored information on TF gene models for maize, sorghum, rice, Brachypodium, sugarcane and other grasses (Burdo et al., 2014). In the following section, an assumption is made that at least a draft genome or draft transcriptome is available and that a set of gene models is available that have been determined ab initio or with additional manual annotation. Familiarity with the use of PERL scripts is advantageous for the gene model assembly phase.

Figure 1. Flowchart for the generation of a TFome project. Flowchart outlining the general strategy for template identification, PCR amplification and cloning of transcription factor (TF) full length (FL) open reading frames (ORFs). (modified from Burdo et al., 2014)

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.