Bioinformatic analyses

CG Chloe Goldsmith
JR Jesús Rafael Rodríguez-Aguilera
IE Ines El-Rifai
AJ Adrien Jarretier-Yuste
VH Valérie Hervieu
OR Olivier Raineteau
PS Pierre Saintigny
VS Victoria Chagoya de Sánchez
RD Robert Dante
GI Gabriel Ichim
HH Hector Hernandez-Vargas
request Request a Protocol
ask Ask a question
Favorite

Our analyses are divided in two parts, our initial CpG methylation pipeline based on running Nanopolish in parallel to Guppy + Medaka for each sample, and the second one based on Megalodon to call 5mC in all contexts (i.e. CpG and CpH).

For our initial CpG methylation analyses, basecalling was performed with Guppy version 4.0.15 (ONT). We first determined the methylation status of each CpG site on every read by using the widely used tool, nanopolish23 used recently by51. For validation, we also called DNA methylation using the novel tool, Medaka (https://github.com/nanoporetech/medaka). Medaka is a tool to create a consensus sequence from nanopore sequencing data. This task is performed using neural networks applied from a pileup of individual sequencing reads against a draft assembly. It outperforms graph-based methods operating on basecalled data, and can be competitive with state-of-the-art signal-based methods, whilst being much faster. Both tools have been recently benchmarked22. PycoQC was used for data inspection and quality control (https://github.com/a-slide/pycoQC), and methplotlib (https://github.com/wdecoster/methplotlib) for read-level visualizations. Called CpG sites in the FU control were used to determine a baseline of methylation. The following calculation was utilised: FalsePositiveRate = [#called methylated cytosines in FU/#called cytosines in FU].

To call 5mC in all contexts, we first demultiplexed the reads at the raw (fast5) level using Deepbinner (v.0.2.0)52, before basecalling and extracting base modification information using the most recent model from the Rerio repository (https://github.com/nanoporetech/rerio) implemented through the Megalodon tool (v.2.2.9) (https://github.com/nanoporetech/megalodon). Megalodon extracts high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome. Specifically, we used the corresponding MinION (res_dna_r941_min_modbases_5mC_v001.cfg v.4.2.2) and PromethION (res_dna_r941_prom_modbases_5mC_v001.cfg v.4.2.2) models, with a –mod-binary-threshold of 0.8, as recommended. To validate CpH analyses, we used Megalodon (as described above) to re-basecall raw fast5 data from a published dataset53 (https://www.ebi.ac.uk/ena/browser/view/PRJEB33258). As these libraries were prepared after cas9-targeting of selected genes (3 gRNA designs, that resulted in 19 hg38 genomic regions), we pooled the reads from several samples to be able to plot as enriched heatmaps (Fig. 6D,E).

For differential methylation analyses we used DSS (Dispersion shrinkage for sequencing data)54 adapted for nanopore sequencing51. Briefly, DSS tests for differential methylation at single CpG-sites, using a Wald test on the coefficients of a beta-binomial regression of count data with an ‘arcsine’ link function. In order to set minimum requirements for DSS analysis, an internal comparison of biological replicates of differentiated HepaRG cells was undertaken. From this we were able to better understand the background and determine the minimum smoothing and delta values. These values were set at a delta of 0.05 with minimum P-value of 0.05. For transcription factor binding site analyses, we used the bioconductor packages MIRA21 for methylation data aggregation, and LOLA for dataset selection55. Mann–Whitney’s test was used for pairwise comparisons of 5mCpG distribution.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A