To determine whether a CPE showed local overrepresentation, we partitioned the promoter sequences into two sets: The set CPE+ contained promoters in which the CPE is present at the expected location or up to two nucleotides upstream or downstream of the expected location (functional window) (Figure (Figure1,1, Supplementary Table S2). CPE− contained the remaining sequences. P-values for the cardinality n = |CPE+| were computed using the Gaussian and binomial distributions. To determine the standard score and expected occurrence probability, a 5-nucleotide window was shifted in 1 bp intervals across promoter sequences from position [−500, −495) relative to the TSS to (+195, +200]. Per location, we recorded the number of promoters where the start position of the CPE appeared inside the window and then used the average and standard deviation over all locations that did not overlap with the CPE’s functional window.
CPEs show localized overrepresentation with respect to the TSS and can be represented by PWMs. The sequence logo representing the PWM as well as the IUPAC consensus sequence with the most frequent nucleotides are shown (details in Supplementary Table S1).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.