Data analysis

LB Lirik Behluli
AF Alyssa M. Fontanilla
LA Laura Andessner-Angleitner
NT Nikolas Tolar
JM Julia M. Molina
LG Lenka Gahurova
request Request a Protocol
ask Ask a question
Favorite

The expression of DNMTs was visually inspected and quantified in SeqMonk v1.47.2 (https://www.bioinformatics.babraham.ac.uk/projects/seqmonk/). To confirm that Dnmt3l is annotated at the correct position in the naked mole-rat genome, Dnmt3l sequences from the NCBI database for mouse, human and naked mole-rat were blasted to the naked mole-rat HetGla_female_1.0 genome using NCBI Genome Workbench. A custom Python script was prepared to identify the length of regions with continuous sequence identity between Dnmt3l vs Dnmt3a and Dnmt3b. Expression levels were log2(RPKM + 1)-transformed in R and plotted using ggplot2 [61].

Aire intron sequences for the genome versions listed above and in Additional file 6: Tables S1 and S2 were downloaded from the Ensembl genome database and were converted to their reverse complements. Multiple sequence alignments of the original sequences and reverse complements were performed using MUltiple Sequence Comparison by Log-Expectation (MUSCLE) tool [30], with ClustalW output format and default parameters. The alignment and individual sequences of species without identified homologous sequence for predicted TBPL2/TFIIA binding site were further manually checked for the presence for such sequence according to the sequence similarity and sequence similarity of the surrounding sequences. PhastCons analyses were performed using the PhastWeb interface [37], with expected length = 7 and default target coverage and rho values. Multiple sequence alignments and Newick trees generated by ClustalW were used as input files, and the mouse sequence was consistently used as a reference sequence. PhastCons scores were visualised using ggplot2 [61].

dN/dS ratios were computed for each of the Dnmt3 genes using the CODEML software of PAML v4.9j [64] by running a null model (M0) that assumes equal dN/dS ratios across all branches of the phylogeny, and two branch models that assume differences in dN/dS ratios between foreground and background branches. For the first branch model Hystricognathi were designated as foreground branches, whereas only Fukomys damarensis and Heterocephalus glaber were set as foreground branches for the second. Chi-squared tests were conducted at 1% and 5% significance levels using the LRT statistics from each model [1]. The phylogenies and multiple sequence alignments used as CODEML input were generated using phytools [40] and MACSE v2.07 [38, 39], respectively. Rooted and unrooted consensus trees were created based on a subset of 1,000 node-dated phylogenetic trees downloaded from VertLife [55]. Sequences included in the alignments (listed in Additional file 6: Table S4) were downloaded from the NCBI Nucleotide database. For the alignment runs, internal frameshifts were replaced with “---”, while internal stop codons were replaced with “NNN”. The alignments were also trimmed, so that they start and end with at least 80% of the sequences having a nucleotide.

Expression of TE subfamilies (raw counts) was quantified using TEtranscripts v2.2.3 [24], using GRCm39 mouse genome and corresponding TE annotation available on the TEtranscripts website (https://labshare.cshl.edu/shares/mhammelllab/www-data/TEtranscripts/TE_GTF/). This was followed by the identification of differentially expressed TEs using NOISeq-sim within NOISeq [52]. For quantification of individual TE insertion expression, only elements with > 50 insertions outside gene annotation with > 5 reads (mapped using Hisat2 v2.0.5 reporting random one position for multimapping reads) in either condition were considered. TE annotation was obtained from UCSC genome browser for GRCm38.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A