WGBS data analyses

ZC Zhiyuan Chen
DH Darren E. Hagen
TJ Tieming Ji
CE Christine G. Elsik
RR Rocío M. Rivera
request Request a Protocol
ask Ask a question
Favorite

Following adaptor removal and quality trimming, WGBS read pairs were aligned to the bovine reference genome assembly UMD3.1 using Bismark (version: 0.15.0)50 with the default parameters. Only uniquely aligned reads were used for subsequent analyses. In order to distinguish the C > T SNPs from C > T substitutions that were caused by sodium bisulfite conversion in the F1 fetuses, Bis-SNP (version: 0.82.2)35 was used to perform the SNP calling. Only CpG sites that showed consensus CpG context for both parental alleles (i.e., do not overlap any SNPs identified by Bis-SNP) were used for the subsequent analyses. The methylation level was determined using the reads from both forward and reverse strands covering the same symmetric CpG site. The methylation level was calculated as: (number of “C” reads)/(number of “C” reads + number of “T” reads).

In order to identify ASM regions, the WGBS read pairs overlapping SNPs were assigned to their parental origin based on the genotype of the B. t. indicus sire33. Further, the allelic WGBS reads were pooled from four controls to determine the sequencing depth and only CpGs that had at least 4 × coverage of each allele were used for Fisher’s exact test. To estimate the FDR, WGBS read pairs overlapping SNPs were randomly assigned to be either B. t. indicus or B. t. taurus in origin in order to generate a randomized dataset of ASM sites36. As an initial screen, ASM candidate CpGs were identified using a p-value of 0.01, resulting in 109,794 ASM sites with a FDR of ~5% (compared to the 5507 CpGs identified as ASM sites in the permutated datasets). To minimize false positives, the identified ASM candidate sites were clustered into regions as previously described36. Methylkit R package51 was used to perform the hierarchical clustering of the methylation profiles. The DMRs between each LOS fetus and all four controls were identified using the Bsseq R package30.

To estimate the expected number of DEGs associated with DMRs, the identified DMRs were randomly shuffled within each chromosome using “shuffleBed” function from BedTools52. The UMD3.1 genome assembly gaps from UCSC genome Browser were excluded from the potential shuffled positions. Shuffling was performed 1000 times and for each dataset, a DEG was considered to be associated with a DMR if the DMR falls into the DEG coordinates that had been extended by 5 kb or 20 kb. The averaged DEG counts of the 1000 datasets were used as the expected count for chi-square test.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A