Genomics -Systems Biology-BIO-PROTOCOL

Annotated Bioinformatic Pipelines for Genome Assembly and Annotation of Mitochondrial Genomes

JW Jessica C. Winn* AB Aletta E. Bester-van der Merwe SM Simo N. Maduna*

0 Q&A 263 Views Mar 5, 2025

Mitochondrial genomes (mitogenomes) display relatively rapid mutation rates, low sequence recombination, high copy numbers, and maternal inheritance patterns, rendering them valuable blueprints for mapping lineages, uncovering historical migration patterns, understanding intraspecific population dynamics, and investigating how environmental pressures shape traits underpinned by genetic variation. Here, we present the bioinformatic pipeline and code used to assemble and annotate the complete mitogenomes of five houndsharks (Chondrichthyes: Triakidae) and compare them to the mitogenomes of other closely related species. We demonstrate the value of a combined assembly approach for detecting deviations in mitogenome structure and describe how to select an assembly approach that best suits the sequencing data. The datasets required to run our analyses are available on the GitHub and Dryad repositories.

Annotated Bioinformatic Pipelines for Phylogenomic Placement of Mitochondrial Genomes

JW Jessica C. Winn* AB Aletta E. Bester-van der Merwe SM Simo N. Maduna*

0 Q&A 225 Views Mar 5, 2025

The limited standards for the rigorous and objective use of mitochondrial genomes (mitogenomes) can lead to uncertainties regarding the phylogenetic relationships of taxa under varying evolutionary constraints. The mitogenome exhibits heterogeneity in base composition, and evolutionary rates may vary across different regions, which can cause empirical data to violate assumptions of the applied evolutionary models. Consequently, the unique evolutionary signatures of the dataset must be carefully evaluated before selecting an appropriate approach for phylogenomic inference. Here, we present the bioinformatic pipeline and code used to expand the mitogenome phylogeny of the order Carcharhiniformes (groundsharks), with a focus on houndsharks (Chondrichthyes: Triakidae). We present a rigorous approach for addressing difficult-to-resolve phylogenies, incorporating multi-species coalescent modelling (MSCM) to address gene/species tree discordance. The protocol describes carefully designed approaches for preparing alignments, partitioning datasets, assigning models of evolution, inferring phylogenies based on traditional site-homogenous concatenation approaches as well as under multispecies coalescent and site heterogenous models, and generating statistical data for comparison of different topological outcomes. The datasets required to run our analyses are available on GitHub and Dryad repositories.

Phylogenomics of Plant NLR Immune Receptors to Identify Functionally Conserved Sequence Motifs

TS Toshiyuki Sakai AT AmirAli Toghani HA Hiroaki Adachi*

0 Q&A 1177 Views Jul 5, 2024

In recent years, the increase in genome sequencing across diverse plant species has provided a significant advantage for phylogenomics studies, allowing the analysis of one of the most diverse gene families in plants: nucleotide-binding leucine-rich repeat receptors (NLRs). However, due to the sequence diversity of the NLR gene family, identifying key molecular features and functionally conserved sequence patterns is challenging through multiple sequence alignment. Here, we present a step-by-step protocol for a computational pipeline designed to identify evolutionarily conserved motifs in plant NLR proteins. In this protocol, we use a large-scale NLR dataset, including 1,862 NLR genes annotated from monocot and dicot species, to predict conserved sequence motifs, such as the MADA and EDVID motifs, within the coiled-coil (CC)-NLR subfamily. Our pipeline can be applied to identify molecular signatures that have remained conserved in the gene family over evolutionary time across plant species.

Classification of a Massive Number of Viral Genomes and Estimation of Time of Most Recent Common Ancestor (tMRCA) of SARS-CoV-2 Using Phylodynamic Analsysis

XH Xiaowen Hu SG Siqin Guan YH Yiliang He GY Guohui Yi LY Lei Yao* JZ Jiaming Zhang*

0 Q&A 1231 Views Mar 20, 2024

Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes.

Key features

• Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA.

• Classification of genomes based on highly linked sites using custom scripts.

• Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST).

• Visualization of posterior distribution of tMRCA using Tracer.v1.7.2.

• Optimized for the SARS-CoV-2.

Graphical overview

Graphical workflow of time of most recent common ancestor (tMRCA) estimation process

Phylogenetic Inference of Homologous/Orthologous Genes among Distantly Related Plants

ZX Zilong Xu WS Wenyan Sun

Ziqiang Zhu BZ Bojian Zhong ZZ Zhenhua Zhang*

0 Q&A 899 Views Dec 5, 2023

The recent surge in plant genomic and transcriptomic data has laid a foundation for reconstructing evolutionary scenarios and inferring potential functions of key genes related to plants’ development and stress responses. The classical scheme for identifying homologous genes is sequence similarity–based searching, under the crucial assumption that homologous sequences are more similar to each other than they are to any other non-homologous sequences. Advances in plant phylogenomics and computational algorithms have enabled us to systemically identify homologs/orthologs and reconstruct their evolutionary histories among distantly related lineages. Here, we present a comprehensive pipeline for homologous sequences identification, phylogenetic relationship inference, and potential functional profiling of genes in plants.

Key features

• Identification of orthologs using large-scale genomic and transcriptomic data.

• This protocol is generalized for analyzing the evolution of plant genes.

A Microfluidic Platform for Screening Gene Expression Dynamics across Yeast Strain Libraries

ES Elizabeth Stasiowski RO Richard O’Laughlin SH Shayna Holness NC Nicholas Csicsery JH Jeff Hasty NH Nan Hao*

0 Q&A 491 Views Nov 20, 2023

The relative ease of genetic manipulation in S. cerevisiae is one of its greatest strengths as a model eukaryotic organism. Researchers have leveraged this quality of the budding yeast to study the effects of a variety of genetic perturbations, such as deletion or overexpression, in a high-throughput manner. This has been accomplished by producing a number of strain libraries that can contain hundreds or even thousands of distinct yeast strains with unique genetic alterations. While these strategies have led to enormous increases in our understanding of the functions and roles that genes play within cells, the techniques used to screen genetically modified libraries of yeast strains typically rely on plate or sequencing-based assays that make it difficult to analyze gene expression changes over time. Microfluidic devices, combined with fluorescence microscopy, can allow gene expression dynamics of different strains to be captured in a continuous culture environment; however, these approaches often have significantly lower throughput compared to traditional techniques. To address these limitations, we have developed a microfluidic platform that uses an array pinning robot to allow for up to 48 different yeast strains to be transferred onto a single device. Here, we detail a validated methodology for constructing and setting up this microfluidic device, starting with the photolithography steps for constructing the wafer, then the soft lithography steps for making polydimethylsiloxane (PDMS) microfluidic devices, and finally the robotic arraying of strains onto the device for experiments. We have applied this device for dynamic screens of a protein aggregation library; however, this methodology has the potential to enable complex and dynamic screens of yeast libraries for a wide range of applications.

Key features

• Major steps of this protocol require access to specialized equipment (i.e., microfabrication tools typically found in a cleanroom facility and an array pinning robot).

• Construction of microfluidic devices with multiple different feature heights using photolithography and soft lithography with PDMS.

• Robotic spotting of up to 48 different yeast strains onto microfluidic devices.

Simultaneous Profiling of Chromosome Conformation and Gene Expression in Single Cells

YC Yujie Chen HX Heming Xu ZL Zhiyuan Liu DX Dong Xing*

0 Q&A 1113 Views Nov 20, 2023

Rapid development in single-cell chromosome conformation capture technologies has provided valuable insights into the importance of spatial genome architecture for gene regulation. However, a long-standing technical gap remains in the simultaneous characterization of three-dimensional genomes and transcriptomes in the same cell. We have described an assay named Hi-C and RNA-seq employed simultaneously (HiRES), which integrates in situ reverse transcription and chromosome conformation capture (3C) for the parallel analysis of chromatin organization and gene expression. Here, we provide a detailed implementation of the assay, using mouse embryos and cerebral cortices as examples. The versatility of this method extends beyond these two samples, with the potential to be used in various other cell types.

Key features

• A multi-omics sequencing approach to profile 3D genome structure and gene expression simultaneously in single cells.

• Compatible with animal tissues.

• One-tube amplification of both DNA and RNA components.

• Requires three days to complete.

Graphical overview

Schematic illustration for the Hi-C and RNA-seq employed simultaneously (HiRES) workflow

Workflow for High-throughput Screening of Enzyme Mutant Libraries Using Matrix-assisted Laser Desorption/Ionization Mass Spectrometry Analysis of Escherichia coli Colonies

KC Kisurb Choe JS Jonathan V. Sweedler*

0 Q&A 682 Views Nov 5, 2023

High-throughput molecular screening of microbial colonies and DNA libraries are critical procedures that enable applications such as directed evolution, functional genomics, microbial identification, and creation of engineered microbial strains to produce high-value molecules. A promising chemical screening approach is the measurement of products directly from microbial colonies via optically guided matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS). Measuring the compounds from microbial colonies bypasses liquid culture with a screen that takes approximately 5 s per sample. We describe a protocol combining a dedicated informatics pipeline and sample preparation method that can prepare up to 3,000 colonies in under 3 h. The screening protocol starts from colonies grown on Petri dishes and then transferred onto MALDI plates via imprinting. The target plate with the colonies is imaged by a flatbed scanner and the colonies are located via custom software. The target plate is coated with MALDI matrix, MALDI-MS analyzes the colony locations, and data analysis enables the determination of colonies with the desired biochemical properties. This workflow screens thousands of colonies per day without requiring additional automation. The wide chemical coverage and the high sensitivity of MALDI-MS enable diverse screening projects such as modifying enzymes and functional genomics surveys of gene activation/inhibition libraries.

Key features

• Mass spectrometry analyzes a range of compounds from E. coli colonies as a proxy for liquid culture testing enzyme mutant libraries.

• Colonies are transferred to a MALDI target plate by a simple imprinting method.

• The screen compares the ratio among several products or searches for the qualitative presence of specific compounds.

• The protocol requires a MALDI mass spectrometer.

Graphical overview

Overview of the MALDI-MS analysis of microbial colonies for screening mutant libraries. Microbial cells containing a mutant library for enzymes/metabolic pathways are first grown in agar. The colonies are then imprinted onto a MALDI target plate using a filter paper intermediate. An optical image of the MALDI target plate is analyzed by custom software to find the locations of individual colonies and direct subsequent MALDI-MS analyses to the selected colonies. After applying MALDI matrix onto the target plate, MALDI-MS analysis of the colonies is performed. Colonies showing the desired product profiles are found by data analysis via the software, and the colonies are picked for downstream analysis.

Testing for Allele-specific Expression from Human Brain Samples

MD Maria E. Diaz-Ortiz NJ Nimansha Jain MG Michael D. Gallagher MP Marijan Posavi TU Travis L. Unger AC Alice S. Chen-Plotkin*

0 Q&A 547 Views Oct 5, 2023

Many single nucleotide polymorphisms (SNPs) identified by genome-wide association studies exert their effects on disease risk as expression quantitative trait loci (eQTL) via allele-specific expression (ASE). While databases for probing eQTLs in tissues from normal individuals exist, one may wish to ascertain eQTLs or ASE in specific tissues or disease-states not characterized in these databases. Here, we present a protocol to assess ASE of two possible target genes (GPNMB and KLHL7) of a known genome-wide association study (GWAS) Parkinson’s disease (PD) risk locus in postmortem human brain tissue from PD and neurologically normal individuals. This was done using a sequence of RNA isolation, cDNA library generation, enrichment for transcripts of interest using customizable cDNA capture probes, paired-end RNA sequencing, and subsequent analysis. This method provides increased sensitivity relative to traditional bulk RNAseq-based and a blueprint that can be extended to the study of other genes, tissues, and disease states.

Key features

• Analysis of GPNMB allele-specific expression (ASE) in brain lysates from cognitively normal controls (NC) and Parkinson’s disease (PD) individuals.

• Builds on the ASE protocol of Mayba et al. (2014) and extends application from cells to human tissue.

• Increased sensitivity by enrichment for desired transcript via RNA CaptureSeq (Mercer et al., 2014).

• Optimized for human brain lysates from cingulate gyrus, caudate nucleus, and cerebellum.

Graphical overview

Revised iCLIP-seq Protocol for Profiling RNA–protein Interaction Sites at Individual Nucleotide Resolution in Living Cells

SN Syed Nabeel-Shah JG Jack F. Greenblatt*

0 Q&A 2884 Views Jun 5, 2023

Individual nucleotide resolution UV cross-linking and immunoprecipitation followed by high-throughput sequencing (iCLIP-seq) is a powerful technique that is used to identify RNA-binding proteins’ (RBP) binding sites on target RNAs and to characterize the molecular basis of posttranscriptional regulatory pathways. Several variants of CLIP have been developed to improve its efficiency and simplify the protocol [e.g., iCLIP2 and enhanced CLIP (eCLIP)]. We have recently reported that transcription factor SP1 functions in the regulation of alternative cleavage and polyadenylation through direct RNA binding. We utilized a modified iCLIP method to identify RNA-binding sites for SP1 and several of the cleavage and polyadenylation complex subunits, including CFIm25, CPSF7, CPSF100, CPSF2, and Fip1. Our revised protocol takes advantage of several features of the eCLIP procedure and also improves on certain steps of the original iCLIP method, including optimization of circularization of cDNA. Herein, we describe a step-by-step procedure for our revised iCLIP-seq protocol, that we designate as iCLIP-1.5, and provide alternative approaches for certain difficult-to-CLIP proteins.

Key features

• Identification of RNA-binding sites of RNA-binding proteins (RBPs) at nucleotide resolution.

• iCLIP-seq provides precise positional and quantitative information on the RNA-binding sites of RBPs in living cells.

• iCLIP facilitates the identification of sequence motifs recognized by RBPs.

• Allows quantitative analysis of genome-wide changes in protein-RNA interactions.

• Revised iCLIP-1.5 protocol is more efficient and highly robust; it provides higher coverage even for low-input samples.

Graphical overview

Systems Biology

Categories