Improve Research Reproducibility A Bio-protocol resource

Systems Biology


Categories

Protocols in Current Issue
Protocols in Past Issues
0 Q&A 460 Views May 5, 2025

Chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) is a widely used technique for genome-wide analyses of protein–DNA interactions. This protocol provides a guide to ChIP-seq data processing in Saccharomyces cerevisiae, with a focus on signal normalization to address data biases and enable meaningful comparisons within and between samples. Designed for researchers with minimal bioinformatics experience, it includes practical overviews and refers to scripting examples for key tasks, such as configuring computational environments, trimming and aligning reads, processing alignments, and visualizing signals. This protocol employs the sans-spike-in method for quantitative ChIP-seq (siQ-ChIP) and normalized coverage for absolute and relative comparisons of ChIP-seq data, respectively. While spike-in normalization, which is semiquantitative, is addressed for context, siQ-ChIP and normalized coverage are recommended as mathematically rigorous and reliable alternatives.

0 Q&A 227 Views May 5, 2025

The KAS-ATAC assay provides a method to capture genomic DNA fragments that are simultaneously physically accessible and contain single-stranded DNA (ssDNA) bubbles. These are characteristic features of two of the key processes involved in regulating and expressing genes—on one hand, the activity of cis-regulatory elements (cREs), which are typically devoid of nucleosomes when active and occupied by transcription factors, and on the other, the association of RNA polymerases with DNA, which results in the presence of ssDNA structures. Here, we present a detailed protocol for carrying out KAS-ATAC as well as basic processing of KAS-ATAC datasets and discuss the key considerations for its successful application.

0 Q&A 385 Views Apr 20, 2025

Bayesian phylogenetic analysis is essential for elucidating evolutionary relationships among organisms. Traditional methods often rely on fixed models and manual parameter settings, which can limit accuracy and efficiency. This protocol presents an integrated workflow that leverages GUIDANCE2 for rigorous sequence alignment, ProtTest and MrModeltest for robust model selection, and MrBayes for phylogenetic tree estimation through Bayesian inference. By automating key steps and providing detailed command-line instructions, this protocol enhances the reliability and reproducibility of phylogenetic studies.

0 Q&A 560 Views Apr 20, 2025

With reduced genotyping costs, genome-wide association studies (GWAS) face more challenges in diverse populations with complex structures to map genes of interest. The complex structure demands sophisticated statistical models, and increased marker density and population size require efficient computing tools. Many statistical models and computing tools have been developed with varied properties in statistical power, computing efficiency, and user-friendly accessibility. Some statistical models were developed with dedicated computing tools, such as efficient mixed model analysis (EMMA), multiple loci mixed model (MLMM), fixed and random model circulating probability unification (FarmCPU), and Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK). However, there are computing tools (e.g., GAPIT) that implement multiple statistical models, retain a constant user interface, and maintain enhancement on input data and result interpretation. In this study, we developed a protocol utilizing a minimal set of software tools (BEAGLE, BLINK, and GAPIT) to perform a variety of analyses including file format conversion, missing genotype imputation, GWAS, and interpretation of input data and outcome results. We demonstrated the protocol by reanalyzing data from the Rice 3000 Genomes Project and highlighting advancements in GWAS model development.

0 Q&A 1409 Views Mar 20, 2025

This manuscript details two modified protocols for the isolation of long-stranded or high molecular weight (HMW) DNA from Magnaporthaceae (Ascomycota) fungal mycelium intended for whole genome sequencing. The Cytiva Nucleon PhytoPure and the Macherey-Nagel NucleoBond HMW DNA kits were selected because the former requires lower amounts of starting material and the latter utilizes gentler methods to maximize DNA length, albeit at a higher requirement for input material. The Cytiva Nucleon PhytoPure kit successfully recovered HMW DNA for half of our fungal species by increasing the amount of RNase A treatment and adding in a proteinase K treatment. To reduce the impact of pigmentation development, which occurs toward later stages of culturing, extractions were run in quadruplicate to increase overall DNA concentration. We also adapted the Macherey-Nagel NucleoBond HMW DNA kit for high-quality HMW DNA by grinding the sample to a fine powder, overnight lysis, and splitting the sample before washing the precipitated DNA. For both kits, precipitated DNA was spooled out pre-washing, ensuring a higher percentage of high-integrity long strands. The Macherey-Nagel protocol offers advantages over the first through the utilization of gravity columns that provide gentler treatment, yielding >50% of high-purity DNA strands exceeding 40 kbp. The limitation of this method is the requirement for a large quantity of starting material (1 g). By triaging samples based on the rate of growth relative to the accumulation of secondary metabolites, our methodologies hold promise for yielding reliable and high-quality HMW DNA from a variety of fungal samples, improving sequencing outcomes.

0 Q&A 513 Views Mar 20, 2025

Zebrafish genetic mutants have emerged as a valuable model system for studying various aspects of disease and developmental biology. Mutant zebrafish embryos are generally identified based on phenotypic defects at later developmental stages, making it difficult to investigate underlying molecular mechanisms at earlier stages. This protocol presents a PCR-based genotyping method that enables the identification of wild-type, heterozygous, and homozygous zebrafish genetic mutants at any developmental stage, even when they are phenotypically indistinguishable. The approach involves the amplification of specific genomic regions using carefully designed primers, followed by gel electrophoresis. This genotyping method facilitates the investigation of the molecular mechanisms driving phenotypic defects that are observed at later timepoints. This protocol allows researchers to perform analyses such as immunofluorescence, RT-PCR, RNA sequencing, and other molecular experiments on early developmental stages of mutants. The availability of this protocol expands the utility of zebrafish genetic mutants for elucidating the molecular underpinnings of various biological processes throughout development.

0 Q&A 495 Views Mar 5, 2025

Mitochondrial genomes (mitogenomes) display relatively rapid mutation rates, low sequence recombination, high copy numbers, and maternal inheritance patterns, rendering them valuable blueprints for mapping lineages, uncovering historical migration patterns, understanding intraspecific population dynamics, and investigating how environmental pressures shape traits underpinned by genetic variation. Here, we present the bioinformatic pipeline and code used to assemble and annotate the complete mitogenomes of five houndsharks (Chondrichthyes: Triakidae) and compare them to the mitogenomes of other closely related species. We demonstrate the value of a combined assembly approach for detecting deviations in mitogenome structure and describe how to select an assembly approach that best suits the sequencing data. The datasets required to run our analyses are available on the GitHub and Dryad repositories.

0 Q&A 446 Views Mar 5, 2025

The limited standards for the rigorous and objective use of mitochondrial genomes (mitogenomes) can lead to uncertainties regarding the phylogenetic relationships of taxa under varying evolutionary constraints. The mitogenome exhibits heterogeneity in base composition, and evolutionary rates may vary across different regions, which can cause empirical data to violate assumptions of the applied evolutionary models. Consequently, the unique evolutionary signatures of the dataset must be carefully evaluated before selecting an appropriate approach for phylogenomic inference. Here, we present the bioinformatic pipeline and code used to expand the mitogenome phylogeny of the order Carcharhiniformes (groundsharks), with a focus on houndsharks (Chondrichthyes: Triakidae). We present a rigorous approach for addressing difficult-to-resolve phylogenies, incorporating multi-species coalescent modelling (MSCM) to address gene/species tree discordance. The protocol describes carefully designed approaches for preparing alignments, partitioning datasets, assigning models of evolution, inferring phylogenies based on traditional site-homogenous concatenation approaches as well as under multispecies coalescent and site heterogenous models, and generating statistical data for comparison of different topological outcomes. The datasets required to run our analyses are available on GitHub and Dryad repositories.

0 Q&A 1397 Views Jul 5, 2024

In recent years, the increase in genome sequencing across diverse plant species has provided a significant advantage for phylogenomics studies, allowing the analysis of one of the most diverse gene families in plants: nucleotide-binding leucine-rich repeat receptors (NLRs). However, due to the sequence diversity of the NLR gene family, identifying key molecular features and functionally conserved sequence patterns is challenging through multiple sequence alignment. Here, we present a step-by-step protocol for a computational pipeline designed to identify evolutionarily conserved motifs in plant NLR proteins. In this protocol, we use a large-scale NLR dataset, including 1,862 NLR genes annotated from monocot and dicot species, to predict conserved sequence motifs, such as the MADA and EDVID motifs, within the coiled-coil (CC)-NLR subfamily. Our pipeline can be applied to identify molecular signatures that have remained conserved in the gene family over evolutionary time across plant species.

0 Q&A 1309 Views Mar 20, 2024

Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes.


Key features

• Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA.

• Classification of genomes based on highly linked sites using custom scripts.

• Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST).

• Visualization of posterior distribution of tMRCA using Tracer.v1.7.2.

• Optimized for the SARS-CoV-2.


Graphical overview



Graphical workflow of time of most recent common ancestor (tMRCA) estimation process




We use cookies to improve your user experience on this site. By using our website, you agree to the storage of cookies on your computer.