Motif enrichment and TF binding site prediction

CC Carole Le Coz
DN David N. Nguyen
CS Chun Su
BN Brian E. Nolan
AA Amanda V. Albrecht
SX Suela Xhani
DS Di Sun
BD Benjamin Demaree
PP Piyush Pillarisetti
CK Caroline Khanna
FW Francis Wright
PC Peixin Amy Chen
SY Samuel Yoon
AS Amy L. Stiegler
KM Kelly Maurer
JG James P. Garifallou
AR Amy Rymaszewski
SK Steven H. Kroft
TO Timothy S. Olson
AS Alix E. Seif
GW Gerald Wertheim
SG Struan F.A. Grant
LV Linda T. Vo
JP Jennifer M. Puck
KS Kathleen E. Sullivan
JR John M. Routes
VZ Viktoria Zakharova
AS Anna Shcherbina
AM Anna Mukhina
NR Natasha L. Rudy
AH Anna C.E. Hurst
TA T. Prescott Atkinson
TB Titus J. Boggon
HH Hakon Hakonarson
AA Adam R. Abate
JH Joud Hajjar
SN Sarah K. Nicholas
JL James R. Lupski
JV James Verbsky
IC Ivan K. Chinn
MG Michael V. Gonzalez
AW Andrew D. Wells
AM Alex Marson
GP Gregory M.K. Poon
NR Neil Romberg
request Request a Protocol
ask Ask a question
Favorite

The HOMER motif database contains 332 motif matrices and is mostly based on the analysis of public ChIP-seq datasets (http://homer.ucsd.edu/homer/motif/motifDatabase.html). We used this motif collection for both TF enrichment and TF-binding site prediction. The 13,171 OCRs that were less accessible in monoallelic indels compared with unedited were applied to motif enrichment analysis using findMotifsGenome.pl from HOMER (v4.10) with parameter setting size given. The cumulative binominal distribution was used for motif ranking. Any enriched TFs (Benjamini–Hochberg [BH] <0.05) were removed from ranking if they were not expressed or were minimally expressed in unedited samples (mean transcript per million < 1).

Protein interaction quantification (Sherwood et al., 2014) was used to predict TF binding sites from the assembly gap masked genome sequence as described in https://github.com/orzechoj/piq-single. Briefly, HOMER motifs were first converted to jaspar format using R package universalmotif (http://bioconductor.org/packages/release/bioc/html/universalmotif.html) and were used for generating the position weight matrix (PWM) hits across masked genome. The protein interaction quantification was run separately for unedited, monoallelic, and biallelic samples after merging the BAM files. A binding site candidate was defined by using the purity score cutoff 0.7 in at least one condition and overlapping with precalled OCRs.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A