The HOMER motif database contains 332 motif matrices and is mostly based on the analysis of public ChIP-seq datasets (http://homer.ucsd.edu/homer/motif/motifDatabase.html). We used this motif collection for both TF enrichment and TF-binding site prediction. The 13,171 OCRs that were less accessible in monoallelic indels compared with unedited were applied to motif enrichment analysis using findMotifsGenome.pl from HOMER (v4.10) with parameter setting size given. The cumulative binominal distribution was used for motif ranking. Any enriched TFs (Benjamini–Hochberg [BH] <0.05) were removed from ranking if they were not expressed or were minimally expressed in unedited samples (mean transcript per million < 1).
Protein interaction quantification (Sherwood et al., 2014) was used to predict TF binding sites from the assembly gap masked genome sequence as described in https://github.com/orzechoj/piq-single. Briefly, HOMER motifs were first converted to jaspar format using R package universalmotif (http://bioconductor.org/packages/release/bioc/html/universalmotif.html) and were used for generating the position weight matrix (PWM) hits across masked genome. The protein interaction quantification was run separately for unedited, monoallelic, and biallelic samples after merging the BAM files. A binding site candidate was defined by using the purity score cutoff 0.7 in at least one condition and overlapping with precalled OCRs.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.