Advanced Search
Published: Feb 5, 2024 DOI: 10.21769/BioProtoc.4943 Views: 373
Abstract
DNA methylation is known to be a conserved repressive epigenetic modification ineukaryotic organisms, which involves the transfer of a methyl group to the C5position of cytosine by DNA methyltransferase. In plants, DNA methylation occursin CG and non-CG (including CHG and CHH, H = A, T, C) sequence contexts. It iswidespread in the genome and involved in various biological processes toregulate gene expression and genome stability. Nowadays, Nanopore sequencingenables the direct detection of DNA modification on native single-moleculelong-read DNA, overcoming the limitation of short-read bisulfite sequencing. Tofacilitate the Nanopore-based DNA methylation analysis in plants, in thisprotocol, we provide the guidance of the software DeepSignal-plant, which canaccurately call 5mC in all three contexts of CG, CHG, and CHH with highcorrelation with bisulfite sequencing in plants.
Keywords: PlantBackground
DNA methylation is one of the most important epigenetic modifications. It is involvedin the regulation of gene expression, gene imprinting, transposon silencing, andchromatin packaging in response to developmental and environmental stimulation [1].In mammals, DNA methylation mainly occurs in the CG context, and non-CG methylation(CHG and CHH) is only found in specific cell types such as brain cells andpluripotent cells [2]. In plants, however, in addition to the CG methylation, CHGand CHH methylations are also widespread throughout the genome and play importantroles in gene silencing [3]. There are various kinds of strategies for the detectionof DNA methylation. Among them, Illumina-based whole genome bisulfite sequencing(WGBS) can obtain global patterns at single-base resolution and thus serves as thegold standard for genome-wide DNA methylation analysis [4]. This method usesbisulfite to convert the cytosine into uracil; the latter turns into thymine afterPCR amplification, while the cytosine with 5mC can resist the conversion. Theirdifference can be identified by mapping to the reference genome after sequencing.However, WGBS has shortcomings like low coverage on repetitive regions due to theshort-read sequencing and false positives caused by incomplete conversion. Nanoporesequencing provides good solutions for all these problems. It allows the directdetection of methylation status on native single-molecule DNA without chemicaltreatment and PCR amplification. By studying the electric signal characters (called squiggle) produced when DNA goes through the Nanopore, the DNA sequence andassociated modification can be decoded [5]. The Nanopore long reads can cover largegenomic regions and enable the profiling of repetitive and complex regions as wellas the phasing haplotypes [6]. Currently, Nanopore sequencing can detect not only5mC but also other modifications like 5hmC, 4mC, and 6mA on DNA [7], and has beenapplied in bacteria [8,9], humans [10], and plants [11].
Many algorithms have been developed to decode the modification signal from Nanoporedata [7,12]. However, these methods failed to capture 5mC in the context of CHG andCHH with acceptable accuracies, which hinders their application in plant genomeresearch. To answer this requirement, DeepSignal-plant was developed. It uses deeplearning to recall 5mC in all contexts, having gained a high correlation with WGBSresults [13]. In this protocol, we introduce the data analysis process ofDeepSignal-plant for methylation study in plants.
Equipment
Linux version 3.10.0-862.el7.x86_64 (Red Hat 4.8.5-28) with 48 CPU (2*Intel Gold 6140, 18 cores, 2.3 Ghz) and GPU (2*Nvidia V100, 640 cores, 32 GB).
Software and datasets
Guppy (v4.0.11; https://timkahlke.github.io/LongRead_tutorials/BS_G.html) (Release date, June 18, 2020)
DeepSignal-plant (v0.1.5; https://github.com/PengNi/DeepSignal-plant) (Release date, March 31, 2022)
The pipeline for DeepSignal-plant depends on software listed as follows:
ont_fast5_api (v4.0.2; https://github.com/nanoporetech/ont_fast5_api) (Release date, March 22, 2017)
tombo (v1.5.1; https://github.com/nanoporetech/tombo) (Release date, February 20, 2020)
Conda (v23.3.1, https://docs.conda.io/en/latest/) (Release date, March 28, 2023)
h5ls tools we use to preview the FAST5 files should be installed automatically with conda.
Mamba (v1.4.2, https://mamba.readthedocs.io/en/latest/) (Release date, April 6, 2023)
The Integrative Genomics Viewer (IGV) (v2.6.1 https://software.broadinstitute.org/software/igv/) (Release date, July 26, 2019)
Python (3.7.12) (Release date, September 4, 2021)
Numpy (v1.20.3)
Pandas (1.3.4)
Click (8.1.3)
Seaborn (0.11.1)
Matplotlib (3.4.1)
hurry.filesize (0.9)
Data generated from Nanopore direct DNA sequencing in FAST5 format.
Reference genome in fasta format.
The annotation file of Arabidopsis in gff3 format.
Pre-trained model for plant 5mC calling.
Procedure
Category
Plant Science > Plant molecular biology > DNA
Molecular Biology > DNA > DNA sequencing
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.
Share
Bluesky
X
Copy link