Data analysis

Lirik Behluli; Alyssa M. Fontanilla; Laura Andessner-Angleitner; Nikolas Tolar; Julia M. Molina; Lenka Gahurova

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Data analysis

LB Lirik Behluli

AF Alyssa M. Fontanilla

LA Laura Andessner-Angleitner

NT Nikolas Tolar

JM Julia M. Molina

LG Lenka Gahurova

This method is extracted from research article: Epigenetics Chromatin, Nov 2023

Expression analysis suggests that DNMT3L is required for oocyte de novo DNA methylation only in Muridae and Cricetidae rodents

DOI: 10.1186/s13072-023-00518-2

Request a Protocol

Ask a question

Favorite

The expression of DNMTs was visually inspected and quantified in SeqMonk v1.47.2 (https://www.bioinformatics.babraham.ac.uk/projects/seqmonk/). To confirm that Dnmt3l is annotated at the correct position in the naked mole-rat genome, Dnmt3l sequences from the NCBI database for mouse, human and naked mole-rat were blasted to the naked mole-rat HetGla_female_1.0 genome using NCBI Genome Workbench. A custom Python script was prepared to identify the length of regions with continuous sequence identity between Dnmt3l vs Dnmt3a and Dnmt3b. Expression levels were log₂(RPKM + 1)-transformed in R and plotted using ggplot2 [61].

Aire intron sequences for the genome versions listed above and in Additional file 6: Tables S1 and S2 were downloaded from the Ensembl genome database and were converted to their reverse complements. Multiple sequence alignments of the original sequences and reverse complements were performed using MUltiple Sequence Comparison by Log-Expectation (MUSCLE) tool [30], with ClustalW output format and default parameters. The alignment and individual sequences of species without identified homologous sequence for predicted TBPL2/TFIIA binding site were further manually checked for the presence for such sequence according to the sequence similarity and sequence similarity of the surrounding sequences. PhastCons analyses were performed using the PhastWeb interface [37], with expected length = 7 and default target coverage and rho values. Multiple sequence alignments and Newick trees generated by ClustalW were used as input files, and the mouse sequence was consistently used as a reference sequence. PhastCons scores were visualised using ggplot2 [61].

dN/dS ratios were computed for each of the Dnmt3 genes using the CODEML software of PAML v4.9j [64] by running a null model (M0) that assumes equal dN/dS ratios across all branches of the phylogeny, and two branch models that assume differences in dN/dS ratios between foreground and background branches. For the first branch model Hystricognathi were designated as foreground branches, whereas only Fukomys damarensis and Heterocephalus glaber were set as foreground branches for the second. Chi-squared tests were conducted at 1% and 5% significance levels using the LRT statistics from each model [1]. The phylogenies and multiple sequence alignments used as CODEML input were generated using phytools [40] and MACSE v2.07 [38, 39], respectively. Rooted and unrooted consensus trees were created based on a subset of 1,000 node-dated phylogenetic trees downloaded from VertLife [55]. Sequences included in the alignments (listed in Additional file 6: Table S4) were downloaded from the NCBI Nucleotide database. For the alignment runs, internal frameshifts were replaced with “---”, while internal stop codons were replaced with “NNN”. The alignments were also trimmed, so that they start and end with at least 80% of the sequences having a nucleotide.

Expression of TE subfamilies (raw counts) was quantified using TEtranscripts v2.2.3 [24], using GRCm39 mouse genome and corresponding TE annotation available on the TEtranscripts website (https://labshare.cshl.edu/shares/mhammelllab/www-data/TEtranscripts/TE_GTF/). This was followed by the identification of differentially expressed TEs using NOISeq-sim within NOISeq [52]. For quantification of individual TE insertion expression, only elements with > 50 insertions outside gene annotation with > 5 reads (mapped using Hisat2 v2.0.5 reporting random one position for multimapping reads) in either condition were considered. TE annotation was obtained from UCSC genome browser for GRCm38.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol