Systems Biology-BIO-PROTOCOL

Classification of a Massive Number of Viral Genomes and Estimation of Time of Most Recent Common Ancestor (tMRCA) of SARS-CoV-2 Using Phylodynamic Analsysis

XH Xiaowen Hu SG Siqin Guan YH Yiliang He GY Guohui Yi LY Lei Yao* JZ Jiaming Zhang*

0 Q&A 1071 Views Mar 20, 2024

Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes.

Key features

• Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA.

• Classification of genomes based on highly linked sites using custom scripts.

• Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST).

• Visualization of posterior distribution of tMRCA using Tracer.v1.7.2.

• Optimized for the SARS-CoV-2.

Graphical overview

Graphical workflow of time of most recent common ancestor (tMRCA) estimation process

Phylogenetic Inference of Homologous/Orthologous Genes among Distantly Related Plants

ZX Zilong Xu WS Wenyan Sun

Ziqiang Zhu BZ Bojian Zhong ZZ Zhenhua Zhang*

0 Q&A 577 Views Dec 5, 2023

The recent surge in plant genomic and transcriptomic data has laid a foundation for reconstructing evolutionary scenarios and inferring potential functions of key genes related to plants’ development and stress responses. The classical scheme for identifying homologous genes is sequence similarity–based searching, under the crucial assumption that homologous sequences are more similar to each other than they are to any other non-homologous sequences. Advances in plant phylogenomics and computational algorithms have enabled us to systemically identify homologs/orthologs and reconstruct their evolutionary histories among distantly related lineages. Here, we present a comprehensive pipeline for homologous sequences identification, phylogenetic relationship inference, and potential functional profiling of genes in plants.

Key features

• Identification of orthologs using large-scale genomic and transcriptomic data.

• This protocol is generalized for analyzing the evolution of plant genes.

A Microfluidic Platform for Screening Gene Expression Dynamics across Yeast Strain Libraries

ES Elizabeth Stasiowski RO Richard O’Laughlin SH Shayna Holness NC Nicholas Csicsery JH Jeff Hasty NH Nan Hao*

0 Q&A 369 Views Nov 20, 2023

The relative ease of genetic manipulation in S. cerevisiae is one of its greatest strengths as a model eukaryotic organism. Researchers have leveraged this quality of the budding yeast to study the effects of a variety of genetic perturbations, such as deletion or overexpression, in a high-throughput manner. This has been accomplished by producing a number of strain libraries that can contain hundreds or even thousands of distinct yeast strains with unique genetic alterations. While these strategies have led to enormous increases in our understanding of the functions and roles that genes play within cells, the techniques used to screen genetically modified libraries of yeast strains typically rely on plate or sequencing-based assays that make it difficult to analyze gene expression changes over time. Microfluidic devices, combined with fluorescence microscopy, can allow gene expression dynamics of different strains to be captured in a continuous culture environment; however, these approaches often have significantly lower throughput compared to traditional techniques. To address these limitations, we have developed a microfluidic platform that uses an array pinning robot to allow for up to 48 different yeast strains to be transferred onto a single device. Here, we detail a validated methodology for constructing and setting up this microfluidic device, starting with the photolithography steps for constructing the wafer, then the soft lithography steps for making polydimethylsiloxane (PDMS) microfluidic devices, and finally the robotic arraying of strains onto the device for experiments. We have applied this device for dynamic screens of a protein aggregation library; however, this methodology has the potential to enable complex and dynamic screens of yeast libraries for a wide range of applications.

Key features

• Major steps of this protocol require access to specialized equipment (i.e., microfabrication tools typically found in a cleanroom facility and an array pinning robot).

• Construction of microfluidic devices with multiple different feature heights using photolithography and soft lithography with PDMS.

• Robotic spotting of up to 48 different yeast strains onto microfluidic devices.

Simultaneous Profiling of Chromosome Conformation and Gene Expression in Single Cells

YC Yujie Chen HX Heming Xu ZL Zhiyuan Liu DX Dong Xing*

0 Q&A 526 Views Nov 20, 2023

Rapid development in single-cell chromosome conformation capture technologies has provided valuable insights into the importance of spatial genome architecture for gene regulation. However, a long-standing technical gap remains in the simultaneous characterization of three-dimensional genomes and transcriptomes in the same cell. We have described an assay named Hi-C and RNA-seq employed simultaneously (HiRES), which integrates in situ reverse transcription and chromosome conformation capture (3C) for the parallel analysis of chromatin organization and gene expression. Here, we provide a detailed implementation of the assay, using mouse embryos and cerebral cortices as examples. The versatility of this method extends beyond these two samples, with the potential to be used in various other cell types.

Key features

• A multi-omics sequencing approach to profile 3D genome structure and gene expression simultaneously in single cells.

• Compatible with animal tissues.

• One-tube amplification of both DNA and RNA components.

• Requires three days to complete.

Graphical overview

Schematic illustration for the Hi-C and RNA-seq employed simultaneously (HiRES) workflow

Workflow for High-throughput Screening of Enzyme Mutant Libraries Using Matrix-assisted Laser Desorption/Ionization Mass Spectrometry Analysis of Escherichia coli Colonies

KC Kisurb Choe JS Jonathan V. Sweedler*

0 Q&A 489 Views Nov 5, 2023

High-throughput molecular screening of microbial colonies and DNA libraries are critical procedures that enable applications such as directed evolution, functional genomics, microbial identification, and creation of engineered microbial strains to produce high-value molecules. A promising chemical screening approach is the measurement of products directly from microbial colonies via optically guided matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS). Measuring the compounds from microbial colonies bypasses liquid culture with a screen that takes approximately 5 s per sample. We describe a protocol combining a dedicated informatics pipeline and sample preparation method that can prepare up to 3,000 colonies in under 3 h. The screening protocol starts from colonies grown on Petri dishes and then transferred onto MALDI plates via imprinting. The target plate with the colonies is imaged by a flatbed scanner and the colonies are located via custom software. The target plate is coated with MALDI matrix, MALDI-MS analyzes the colony locations, and data analysis enables the determination of colonies with the desired biochemical properties. This workflow screens thousands of colonies per day without requiring additional automation. The wide chemical coverage and the high sensitivity of MALDI-MS enable diverse screening projects such as modifying enzymes and functional genomics surveys of gene activation/inhibition libraries.

Key features

• Mass spectrometry analyzes a range of compounds from E. coli colonies as a proxy for liquid culture testing enzyme mutant libraries.

• Colonies are transferred to a MALDI target plate by a simple imprinting method.

• The screen compares the ratio among several products or searches for the qualitative presence of specific compounds.

• The protocol requires a MALDI mass spectrometer.

Graphical overview

Overview of the MALDI-MS analysis of microbial colonies for screening mutant libraries. Microbial cells containing a mutant library for enzymes/metabolic pathways are first grown in agar. The colonies are then imprinted onto a MALDI target plate using a filter paper intermediate. An optical image of the MALDI target plate is analyzed by custom software to find the locations of individual colonies and direct subsequent MALDI-MS analyses to the selected colonies. After applying MALDI matrix onto the target plate, MALDI-MS analysis of the colonies is performed. Colonies showing the desired product profiles are found by data analysis via the software, and the colonies are picked for downstream analysis.

Testing for Allele-specific Expression from Human Brain Samples

MD Maria E. Diaz-Ortiz NJ Nimansha Jain MG Michael D. Gallagher MP Marijan Posavi TU Travis L. Unger AC Alice S. Chen-Plotkin*

0 Q&A 402 Views Oct 5, 2023

Many single nucleotide polymorphisms (SNPs) identified by genome-wide association studies exert their effects on disease risk as expression quantitative trait loci (eQTL) via allele-specific expression (ASE). While databases for probing eQTLs in tissues from normal individuals exist, one may wish to ascertain eQTLs or ASE in specific tissues or disease-states not characterized in these databases. Here, we present a protocol to assess ASE of two possible target genes (GPNMB and KLHL7) of a known genome-wide association study (GWAS) Parkinson’s disease (PD) risk locus in postmortem human brain tissue from PD and neurologically normal individuals. This was done using a sequence of RNA isolation, cDNA library generation, enrichment for transcripts of interest using customizable cDNA capture probes, paired-end RNA sequencing, and subsequent analysis. This method provides increased sensitivity relative to traditional bulk RNAseq-based and a blueprint that can be extended to the study of other genes, tissues, and disease states.

Key features

• Analysis of GPNMB allele-specific expression (ASE) in brain lysates from cognitively normal controls (NC) and Parkinson’s disease (PD) individuals.

• Builds on the ASE protocol of Mayba et al. (2014) and extends application from cells to human tissue.

• Increased sensitivity by enrichment for desired transcript via RNA CaptureSeq (Mercer et al., 2014).

• Optimized for human brain lysates from cingulate gyrus, caudate nucleus, and cerebellum.

Graphical overview

Computational Analysis of Plasma Lipidomics from Mice Fed Standard Chow and Ketogenic Diet

AS Amy L. Seufert JH James W. Hickman JC Jaewoo Choi BN Brooke A. Napier*

0 Q&A 872 Views Sep 20, 2023

Dietary saturated fatty acids (SFAs) are upregulated in the blood circulation following digestion. A variety of circulating lipid species have been implicated in metabolic and inflammatory diseases; however, due to the extreme variability in serum or plasma lipid concentrations found in human studies, established reference ranges are still lacking, in addition to lipid specificity and diagnostic biomarkers. Mass spectrometry is widely used for identification of lipid species in the plasma, and there are many differences in sample extraction methods within the literature. We used ultra-high performance liquid chromatography (UPLC) coupled to a high-resolution hybrid triple quadrupole-time-of-flight (QToF) mass spectrometry (MS) to compare relative peak abundance of specific lipid species within the following lipid classes: free fatty acids (FFAs), triglycerides (TAGs), phosphatidylcholines (PCs), and sphingolipids (SGs), in the plasma of mice fed a standard chow (SC; low in SFAs) or ketogenic diet (KD; high in SFAs) for two weeks. In this protocol, we used Principal Component Analysis (PCA) and R to visualize how individual mice clustered together according to their diet, and we found that KD-fed mice displayed unique blood profiles for many lipid species identified within each lipid class compared to SC-fed mice. We conclude that two weeks of KD feeding is sufficient to significantly alter circulating lipids, with PCs being the most altered lipid class, followed by SGs, TAGs, and FFAs, including palmitic acid (PA) and PA-saturated lipids. This protocol is needed to advance knowledge on the impact that SFA-enriched diets have on concentrations of specific lipids in the blood that are known to be associated with metabolic and inflammatory diseases.

Key features

• Analysis of relative plasma lipid concentrations from mice on different diets using R.

• Lipidomics data collected via ultra-high performance liquid chromatography (UPLC) coupled to a high-resolution hybrid triple quadrupole-time-of-flight (QToF) mass spectrometry (MS).

• Allows for a comprehensive comparison of diet-dependent plasma lipid profiles, including a variety of specific lipid species within several different lipid classes.

• Accumulation of certain free fatty acids, phosphatidylcholines, triglycerides, and sphingolipids are associated with metabolic and inflammatory diseases, and plasma concentrations may be clinically useful.

Graphical overview

Controlled Level of Contamination Coupled to Deep Sequencing (CoLoC-seq) Probes the Global Localisation Topology of Organelle Transcriptomes

AS Anna Smirnova DJ Damien Jeandard

Alexandre Smirnov*

0 Q&A 322 Views Sep 20, 2023

Information on RNA localisation is essential for understanding physiological and pathological processes, such as gene expression, cell reprogramming, host–pathogen interactions, and signalling pathways involving RNA transactions at the level of membrane-less or membrane-bounded organelles and extracellular vesicles. In many cases, it is important to assess the topology of RNA localisation, i.e., to distinguish the transcripts encapsulated within an organelle of interest from those merely attached to its surface. This allows establishing which RNAs can, in principle, engage in local molecular interactions and which are prevented from interacting by membranes or other physical barriers. The most widely used techniques interrogating RNA localisation topology are based on the treatment of isolated organelles with RNases with subsequent identification of the surviving transcripts by northern blotting, qRT-PCR, or RNA-seq. However, this approach produces incoherent results and many false positives. Here, we describe Controlled Level of Contamination coupled to deep sequencing (CoLoC-seq), a more refined subcellular transcriptomics approach that overcomes these pitfalls. CoLoC-seq starts by the purification of organelles of interest. They are then either left intact or lysed and subjected to a gradient of RNase concentrations to produce unique RNA degradation dynamics profiles, which can be monitored by northern blotting or RNA-seq. Through straightforward mathematical modelling, CoLoC-seq distinguishes true membrane-enveloped transcripts from degradable and non-degradable contaminants of any abundance. The method has been implemented in the mitochondria of HEK293 cells, where it outperformed alternative subcellular transcriptomics approaches. It is applicable to other membrane-bounded organelles, e.g., plastids, single-membrane organelles of the vesicular system, extracellular vesicles, or viral particles.

Key features

• Tested on human mitochondria; potentially applicable to cell cultures, non-model organisms, extracellular vesicles, enveloped viruses, tissues; does not require genetic manipulations or highly pure organelles.

• In the case of human cells, the required amount of starting material is ~2,500 cm² of 80% confluent cells (or ~3 × 10⁸ HEK293 cells).

• CoLoC-seq implements a special RNA-seq strategy to selectively capture intact transcripts, which requires RNases generating 5′-hydroxyl and 2′/3′-phosphate termini (e.g., RNase A, RNase I).

• Relies on nonlinear regression software with customisable exponential functions.

Graphical overview

T Cell Clonal Analysis Using Single-cell RNA Sequencing and Reference Maps

MA Massimo Andreatta PG Paul Gueguen NB Nicholas Borcherding SC Santiago J. Carmona*

0 Q&A 1933 Views Aug 20, 2023

T cells are endowed with T-cell antigen receptors (TCR) that give them the capacity to recognize specific antigens and mount antigen-specific adaptive immune responses. Because TCR sequences are distinct in each naïve T cell, they serve as molecular barcodes to track T cells with clonal relatedness and shared antigen specificity through proliferation, differentiation, and migration. Single-cell RNA sequencing provides coupled information of TCR sequence and transcriptional state in individual cells, enabling T-cell clonotype-specific analyses. In this protocol, we outline a computational workflow to perform T-cell states and clonal analysis from scRNA-seq data based on the R packages Seurat, ProjecTILs, and scRepertoire. Given a scRNA-seq T-cell dataset with TCR sequence information, cell states are automatically annotated by reference projection using the ProjecTILs method. TCR information is used to track individual clonotypes, assess their clonal expansion, proliferation rates, bias towards specific differentiation states, and the clonal overlap between T-cell subtypes. We provide fully reproducible R code to conduct these analyses and generate useful visualizations that can be adapted for the needs of the protocol user.

Key features

• Computational analysis of paired scRNA-seq and scTCR-seq data

• Characterizing T-cell functional state by reference-based analysis using ProjecTILs

• Exploring T-cell clonal structure using scRepertoire

• Linking T-cell clonality to transcriptomic state to study relationships between clonal expansion and functional phenotype

Graphical overview

Chromatin-RNA in situ Reverse Transcription Sequencing (CRIST-seq) Approach to Profile the Non-coding RNA Interaction Network

SZ Shilin Zhang XW Xue Wen LZ Lei Zhou HL Hui Li WL Wei Li AH Andrew R. Hoffman* JH Ji-Fan Hu* JC Jiuwei Cui*

0 Q&A 613 Views Jul 20, 2023

Non-coding RNAs (ncRNAs) are defined as RNAs that do not encode proteins, but many ncRNAs do have the ability to regulate gene expression. These ncRNAs play a critical role in the epigenetic regulation of various physiological and pathological processes through diverse biochemical mechanisms. However, the existing screening methods to identify regulatory ncRNAs only focus on whole-cell expression levels and do not capture every ncRNA that targets certain genes. We describe a new method, chromatin-RNA in situ reverse transcription sequencing (CRIST-seq), that can identify all the ncRNAs that are associated with the regulation of any given gene. In this article, we targeted the ncRNAs that are associated with pluripotent gene Sox2, allowing us to catalog the ncRNA regulation network of pluripotency maintenance. This methodology is universally applicable for the study of epigenetic regulation of any genes by making simple changes on the CRISPR-dCas9 gRNAs.

Key features

• This method provides a new technique for screening ncRNAs and establishing chromatin interaction networks.

• The target gene for this method can be any gene of interest and any site in the entire genome.

• This method can be further extended to detect RNAs, DNAs, and proteins that interact with target genes.

Graphical overview

Systems Biology

Categories