From the JASPAR database, 746 human transcription factors’ binding profiles represented by Position Specific Scoring Matrices (PSSMs) were downloaded78. The JASPAR format PSSMs were converted to the TRANSFAC format to ease handling of results. To assess the effect of the SNP on the gain or loss of putative TF binding sites, flanking sequences 50 bases upstream and downstream of the SNPs were extracted. The Regulatory Sequence Analysis Tool (RSAT) matrix-scan58 was used to search for potential TFBS in the ancestral and patient-specific mutant alleles. The background model estimation was determined by using residue probabilities from the genome version GRCH38.p7 sequences of all promoters based on the UCSC genome table browser79 5KB before the TSS and all enhancers from the HEDD database80. In calculating the background probabilities we used a Markov order of 1. The search was subject to both strands of the sequences. Hits with a P-value ≤1e-05 were considered binding sites. Other parameters were set at default values.
As a complementary TF binding sites prediction algorithm, FIMO was used60. FIMO predicts the transcription factor targets sites using a matrix-based sequence scanning algorithm without a hidden Markov model, unlike the previous tool RSAT matrix-scan. It calculates the log-odds scores comparing random and test sequences followed by a Benjamini-Hochberg-based false discovery correction of the P-value. The false discovery rate cut-off was 0.1.
To increase the coverage of the TF binding sides, enhancer regions were added using the Human Enhancer Disease Database (HEDD)81. HEDD contains the enhancers from ENCODE82, FANTOM583,84 and the Epigenomics RoadMap85. To assess the effect of the SNPs on miRNA-TSs, the 22 bp sequences of mature miRNAs were retrieved from miRBase86,87. The flanking sequences of SNPs were assessed for the presence of miRNA-TSs using miRanda88. Hits occurring in the seed region (2’–8’) of the miRNAs, and with alignment scores ≥90 and energy threshold ≤ −16 kcal/mol were considered as TS. Other parameters were set to default settings. TSs in the coding region or in the first intronic region were kept. A final manual check was performed to ensure that the SNPs overlapped with the predicted TFBS or miRNA target sites. For the miRNA-TS predictions, miRanda was chosen as it predicts and characterises miRNA binding sites using entropy-based binding energy scores instead of traditional conservation-based methods88. Gain or loss of the regulatory interactions between TFs and protein-coding genes were also considered where the protein-coding gene was in the promoter or in the enhancer region. We defined the promoter regions as 5 kb upstream from the transcription start site and downstream to the first exon of the gene. This information was retrieved using the feature retrieval function of the UCSC genome table browser79. The effect of SNPs on the uncovered TFBS or miRNA-TSs was classified into either a gain or loss of binding site/target site or a neutral change. Only those sites identified as loss or gain regarding sites corresponding to the ancestral allele were considered for subsequent analysis. We referred to genes corresponding to such SNPs as ‘SNP-affected genes’.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.