Human reference genome (hg19) was used to extract the DNA sequence around TSSs for TFBEA. We obtained the gene coordinates from the Ensembl BioMart tool (57) and scanned 2000 bp upstream and 1000 bp downstream of the TSS. The motifs of the TFs were obtained from JASPAR, and the extracted sequences of each target were then fed into JASPAR and analyzed versus their corresponding TFs. Then, using the position weight matrix (PWM) of the TF, JASPAR used the modified Needleman-Wunsch algorithm to align the motif sequence with the target sequence in that the input sequence is scanned to check whether or not the motif is enriched. The output is the enrichment score of the input TF in the designated target genes.

We used PIQ to assess the local TF occupancy footprint from ATAC-seq data (33, 34). We extracted the corresponding BED files for TF footprint analysis using the PIQ R package. All the footprints were annotated using the TF matrix with the names of different TFs annotated in the BED files. For each sample, footprints were generated using three different PIQ purity scores (0.7, 0.8, or 0.9; equivalent to an FDR of 0.3, 0.2, or 0.1, respectively). The corresponding files were then extracted using the MR list, and the peak names/coordinates containing TCF4 gene were collected as a subset of the original BED file. These subsets of genomic coordination were then annotated using the included in the HOMER package with the hg19 reference genome.

