Computational Biology and Bioinformatics-BIO-PROTOCOL

Classification of a Massive Number of Viral Genomes and Estimation of Time of Most Recent Common Ancestor (tMRCA) of SARS-CoV-2 Using Phylodynamic Analsysis

XH Xiaowen Hu SG Siqin Guan YH Yiliang He GY Guohui Yi LY Lei Yao* JZ Jiaming Zhang*

0 Q&A 1136 Views Mar 20, 2024

Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes.

Key features

• Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA.

• Classification of genomes based on highly linked sites using custom scripts.

• Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST).

• Visualization of posterior distribution of tMRCA using Tracer.v1.7.2.

• Optimized for the SARS-CoV-2.

Graphical overview

Graphical workflow of time of most recent common ancestor (tMRCA) estimation process

CoCoNat: A Deep Learning–Based Tool for the Prediction of Coiled-coil Domains in Protein Sequences

MM Matteo Manfredi CS Castrense Savojardo PM Pier Luigi Martelli* RC Rita Casadio

0 Q&A 983 Views Feb 20, 2024

Coiled-coil domains (CCDs) are structural motifs observed in proteins in all organisms that perform several crucial functions. The computational identification of CCD segments over a protein sequence is of great importance for its functional characterization. This task can essentially be divided into three separate steps: the detection of segment boundaries, the annotation of the heptad repeat pattern along the segment, and the classification of its oligomerization state. Several methods have been proposed over the years addressing one or more of these predictive steps. In this protocol, we illustrate how to make use of CoCoNat, a novel approach based on protein language models, to characterize CCDs. CoCoNat is, at its release (August 2023), the state of the art for CCD detection. The web server allows users to submit input protein sequences and visualize the predicted domains after a few minutes. Optionally, precomputed segments can be provided to the model, which will predict the oligomerization state for each of them. CoCoNat can be easily integrated into biological pipelines by downloading the standalone version, which provides a single executable script to produce the output.

Key features

• Web server for the prediction of coiled-coil segments from a protein sequence.

• Three different predictions from a single tool (segment position, heptad repeat annotation, oligomerization state).

• Possibility to visualize the results online or to download the predictions in different formats for further processing.

• Easy integration in automated pipelines with the local version of the tool.

Graphical overview

Phylogenetic Inference of Homologous/Orthologous Genes among Distantly Related Plants

ZX Zilong Xu WS Wenyan Sun

Ziqiang Zhu BZ Bojian Zhong ZZ Zhenhua Zhang*

0 Q&A 665 Views Dec 5, 2023

The recent surge in plant genomic and transcriptomic data has laid a foundation for reconstructing evolutionary scenarios and inferring potential functions of key genes related to plants’ development and stress responses. The classical scheme for identifying homologous genes is sequence similarity–based searching, under the crucial assumption that homologous sequences are more similar to each other than they are to any other non-homologous sequences. Advances in plant phylogenomics and computational algorithms have enabled us to systemically identify homologs/orthologs and reconstruct their evolutionary histories among distantly related lineages. Here, we present a comprehensive pipeline for homologous sequences identification, phylogenetic relationship inference, and potential functional profiling of genes in plants.

Key features

• Identification of orthologs using large-scale genomic and transcriptomic data.

• This protocol is generalized for analyzing the evolution of plant genes.

A Protocol to Retrieve and Curate Spatial and Climatic Data from Online Biodiversity Databases Using R

MC Marina Coca-de-la-Iglesia* VV Virginia Valcárcel NM Nagore G. Medina

0 Q&A 673 Views Oct 20, 2023

Ecological and evolutionary studies often require high quality biodiversity data. This information is readily available through the many online databases that have compiled biodiversity data from herbaria, museums, and human observations. However, the process of preparing this information for analysis is complex and time consuming. In this study, we have developed a protocol in R language to process spatial data (download, merge, clean, and correct) and extract climatic data, using some genera of the ginseng family (Araliaceae) as an example. The protocol provides an automated way to process spatial and climatic data for numerous taxa independently and from multiple online databases. The script uses GBIF, BIEN, and WorldClim as the online data sources, but can be easily adapted to include other online databases. The script also uses genera as the sampling unit but provides a way to use species as the target. The cleaning process includes a filter to remove occurrences outside the natural range of the taxa, gardens, and other human environments, as well as erroneous locations and aspatial correction for misplaced occurrences (i.e., occurrences within a distance buffer from the coastal boundary). Additionally, each step of the protocol can be run independently. Thus, the protocol can begin with data cleaning, if the database has already been compiled, or with climatic data extraction, if the database has already been parsed. Each line of the R script is commented so that it can also be run by users with little knowledge of R.

Serial-section Electron Tomography and Quantitative Analysis of Microtubule Organization in 3D-reconstructed Mitotic Spindles

RK Robert Kiewisz* DB Daniel Baum TM Thomas Müller-Reichert GF Gunar Fabig*

0 Q&A 796 Views Oct 20, 2023

For the analysis of cellular architecture during mitosis, nanometer resolution is needed to visualize the organization of microtubules in spindles. Here, we present a detailed protocol that can be used to produce 3D reconstructions of whole mitotic spindles in cells grown in culture. For this, we attach mammalian cells enriched in mitotic stages to sapphire discs. Our protocol further involves cryo-immobilization by high-pressure freezing, freeze-substitution, and resin embedding. We then use fluorescence light microscopy to stage select mitotic cells in the resin-embedded samples. This is followed by large-scale electron tomography to reconstruct the selected and staged mitotic spindles in 3D. The generated and stitched electron tomograms are then used to semi-automatically segment the microtubules for subsequent quantitative analysis of spindle organization. Thus, by providing a detailed correlative light and electron microscopy (CLEM) approach, we give cell biologists a toolset to streamline the 3D visualization and analysis of spindle microtubules (http://kiewisz.shinyapps.io/asga). In addition, we refer to a recently launched platform that allows for an interactive display of the 3D-reconstructed mitotic spindles (https://cfci.shinyapps.io/ASGA_3DViewer/).

Key features

• High-throughput screening of mitotic cells by correlative light and electron microscopy (CLEM).

• Serial-section electron tomography of selected cells.

• Visualization of mitotic spindles in 3D and quantitative analysis of microtubule organization.

Graphical overview

GutMap: A New Interface for Analysing Regional Motility Patterns in ex vivo Mouse Gastrointestinal Preparations

TA Tanya Abo-Shaban CL Chalystha Y. Q. Lee SH Suzanne Hosie GB Gayathri K. Balasuriya MM Mitra Mohsenipour LJ Leigh A. Johnston EH Elisa L. Hill-Yardin*

0 Q&A 472 Views Oct 5, 2023

Different regions of the gastrointestinal tract have specific functions and thus distinct motility patterns. Motility is primarily regulated by the enteric nervous system (ENS), an intrinsic network of neurons located within the gut wall. Under physiological conditions, the ENS is influenced by the central nervous system (CNS). However, by using ex vivo organ bath experiments, ENS regulation of gut motility can also be studied in the absence of CNS influences. The current technique enables the characterisation of small intestinal, caecal, and colonic motility patterns using an ex vivo organ bath and video imaging protocol. This approach is combined with the novel edge detection script GutMap, available in MATLAB, that functions across Windows and Mac platforms. Dissected intestinal segments are cannulated in an organ bath containing physiological saline with a camera mounted overhead. Video recordings of gut contractions are then converted to spatiotemporal heatmaps and analysed using the GutMap software interface. Using data analysed from the heatmaps, parameters of contractile patterns (including contraction propagation frequency and velocity as well as gut diameter) at baseline and in the presence of drugs/treatments/genetic mutations can be compared. Here, we studied motility patterns of female mice at baseline and in the presence of a nitric oxide synthase inhibitor (N_ω-Nitro-L-arginine; NOLA) (nitric oxide being the main inhibitory neurotransmitter of gut motility) to showcase the application of GutMap. This technique is suitable for application to a broad range of animal models of clinical disorders to understand underlying biological pathways contributing to gastrointestinal dysfunction.

Key features

• Enhanced video imaging analysis of gut contractility in rodents using a novel software interface.

• New edge detection algorithm to accurately contour curvatures of the gastrointestinal tract.

• Allows for output of high-resolution spatiotemporal heatmaps across Windows and Mac platforms.

• Edge detection and analysis method makes motility measurements accessible in different gut regions including the caecum and stomach.

Graphical overview

Computational Analysis of Plasma Lipidomics from Mice Fed Standard Chow and Ketogenic Diet

AS Amy L. Seufert JH James W. Hickman JC Jaewoo Choi BN Brooke A. Napier*

0 Q&A 941 Views Sep 20, 2023

Dietary saturated fatty acids (SFAs) are upregulated in the blood circulation following digestion. A variety of circulating lipid species have been implicated in metabolic and inflammatory diseases; however, due to the extreme variability in serum or plasma lipid concentrations found in human studies, established reference ranges are still lacking, in addition to lipid specificity and diagnostic biomarkers. Mass spectrometry is widely used for identification of lipid species in the plasma, and there are many differences in sample extraction methods within the literature. We used ultra-high performance liquid chromatography (UPLC) coupled to a high-resolution hybrid triple quadrupole-time-of-flight (QToF) mass spectrometry (MS) to compare relative peak abundance of specific lipid species within the following lipid classes: free fatty acids (FFAs), triglycerides (TAGs), phosphatidylcholines (PCs), and sphingolipids (SGs), in the plasma of mice fed a standard chow (SC; low in SFAs) or ketogenic diet (KD; high in SFAs) for two weeks. In this protocol, we used Principal Component Analysis (PCA) and R to visualize how individual mice clustered together according to their diet, and we found that KD-fed mice displayed unique blood profiles for many lipid species identified within each lipid class compared to SC-fed mice. We conclude that two weeks of KD feeding is sufficient to significantly alter circulating lipids, with PCs being the most altered lipid class, followed by SGs, TAGs, and FFAs, including palmitic acid (PA) and PA-saturated lipids. This protocol is needed to advance knowledge on the impact that SFA-enriched diets have on concentrations of specific lipids in the blood that are known to be associated with metabolic and inflammatory diseases.

Key features

• Analysis of relative plasma lipid concentrations from mice on different diets using R.

• Lipidomics data collected via ultra-high performance liquid chromatography (UPLC) coupled to a high-resolution hybrid triple quadrupole-time-of-flight (QToF) mass spectrometry (MS).

• Allows for a comprehensive comparison of diet-dependent plasma lipid profiles, including a variety of specific lipid species within several different lipid classes.

• Accumulation of certain free fatty acids, phosphatidylcholines, triglycerides, and sphingolipids are associated with metabolic and inflammatory diseases, and plasma concentrations may be clinically useful.

Graphical overview

Expression Stability Analysis of Candidate References for Normalization of RT-qPCR Data Using RefSeeker R package

PP Patrick H.D. Petersen JL Joanna Lopacinska-Jørgensen CH Claus K. Høgdall EH Estrid Vilma Høgdall*

0 Q&A 406 Views Sep 5, 2023

When performing expression analysis either for coding RNA (e.g., mRNA) or non-coding RNA (e.g., miRNA), reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) is a widely used method. To normalize these data, one or more stable endogenous references must be identified. RefFinder is an online web-based tool using four almost universally used algorithms for assessing candidate endogenous references—delta-Ct, BestKeeper, geNorm, and Normfinder. However, the online interface is presently cumbersome and time consuming. We developed an R package, RefSeeker, which performs easy and straightforward RefFinder analysis by enabling raw data import and calculation of stability from each of the algorithms and provides data output tools to create graphs and tables. This protocol uses RefSeeker R package for fast and simple RefFinder stability analysis.

Key features

• Perform stability analysis using five algorithms: Normfinder, geNorm, delta-Ct, BestKeeper, and RefFinder.

• Identification of endogenous references for normalization of RT-qPCR data.

• Create publication-ready graphs and tables output.

• Step-by-step guide dialog window for novice R users.

Graphical overview

Simple workflow diagram. Two main workflow paths are presented. A) Using the RefSeeker wizard allows non-R programmers to easily load data and choose between selected output formats. B) Command line interface provides more options to control input and output formats and to automate analysis.

T Cell Clonal Analysis Using Single-cell RNA Sequencing and Reference Maps

MA Massimo Andreatta PG Paul Gueguen NB Nicholas Borcherding SC Santiago J. Carmona*

0 Q&A 2151 Views Aug 20, 2023

T cells are endowed with T-cell antigen receptors (TCR) that give them the capacity to recognize specific antigens and mount antigen-specific adaptive immune responses. Because TCR sequences are distinct in each naïve T cell, they serve as molecular barcodes to track T cells with clonal relatedness and shared antigen specificity through proliferation, differentiation, and migration. Single-cell RNA sequencing provides coupled information of TCR sequence and transcriptional state in individual cells, enabling T-cell clonotype-specific analyses. In this protocol, we outline a computational workflow to perform T-cell states and clonal analysis from scRNA-seq data based on the R packages Seurat, ProjecTILs, and scRepertoire. Given a scRNA-seq T-cell dataset with TCR sequence information, cell states are automatically annotated by reference projection using the ProjecTILs method. TCR information is used to track individual clonotypes, assess their clonal expansion, proliferation rates, bias towards specific differentiation states, and the clonal overlap between T-cell subtypes. We provide fully reproducible R code to conduct these analyses and generate useful visualizations that can be adapted for the needs of the protocol user.

Key features

• Computational analysis of paired scRNA-seq and scTCR-seq data

• Characterizing T-cell functional state by reference-based analysis using ProjecTILs

• Exploring T-cell clonal structure using scRepertoire

• Linking T-cell clonality to transcriptomic state to study relationships between clonal expansion and functional phenotype

Graphical overview

Sample Preparation and Integrative Data Analysis of a Droplet-based Single-Cell ATAC-sequencing Using Murine Thymic Epithelial Cells

TI Tatsuya Ishikawa HI Hiroto Ishii TM Takahisa Miyao KH Kenta Horie MM Maki Miyauchi NA Nobuko Akiyama* TA Taishin Akiyama*

0 Q&A 602 Views Jan 5, 2023

Accessible chromatin regions modulate gene expression by acting as cis-regulatory elements. Understanding the epigenetic landscape by mapping accessible regions of DNA is therefore imperative to decipher mechanisms of gene regulation under specific biological contexts of interest. The assay for transposase-accessible chromatin sequencing (ATAC-seq) has been widely used to detect accessible chromatin and the recent introduction of single-cell technology has increased resolution to the single-cell level. In a recent study, we used droplet-based, single-cell ATAC-seq technology (scATAC-seq) to reveal the epigenetic profile of the transit-amplifying subset of thymic epithelial cells (TECs), which was identified previously using single-cell RNA-sequencing technology (scRNA-seq). This protocol allows the preparation of nuclei from TECs in order to perform droplet-based scATAC-seq and its integrative analysis with scRNA-seq data obtained from the same cell population. Integrative analysis has the advantage of identifying cell types in scATAC-seq data based on cell cluster annotations in scRNA-seq analysis.

Computational Biology and Bioinformatics

Categories