Each conserved intronic and exonic pattern in the 25 selected NRS genes was extracted by BCSDm and scanned for putative binding sites for transcription factors [31] in the CIS-BP Database [32]. The search was performed using the species parameter Homo sapiens since the aim was to find out the relevance of these patterns only for the human species. The motif model was set to the standard scoring system option which is position weight matrices (PWMs) – log-odds [33]. To be more restrictive in allowing mutations and to increase the likelihood of the TFBS predicted, the log-odds threshold was set to ten. To remove sparse sequences, only matching sequences with ≥10 consecutive nucleotides were considered potential TFBS.

After identification of potential TFBS in conserved sequences, the transcription factor binding domain (TFBD) family of the transcription factors inferred to bind these TFBS sequences was analyzed. A classification depending on the domain family type was carried out for each TFBDs. Thirty-seven different family domain types were identified and the ten with the highest number of TFBS were selected for further analysis of the introns. A small fraction of all analyzed TFBDs were NR binding domains. TFBS identified as NR binding sites were further analyzed and compared between introns and exons.

The conserved intronic patterns were further analyzed using the Human Splicing Finder (HSF) database to determine whether SS motifs were contained in their sequences [34]. This tool enables the prediction of potential donor and acceptor sites for the sequence introduced. The analysis used the default prediction algorithms (HSF and MaxEnt). The consensus value (HSF) was increased from 65 to 75 to allow higher similarity and confidence of a true splice site to be obtained [34,35]. Thus, the sequences with a consensus value of ≥75 (HSF) and ≥3 (MaxEnt) were classified as containing a splice site.

The conserved intronic patterns were classified as TFBS or SS depending on their content in regulatory elements. This classification revealed that some patterns contained exclusively TFBS or SS and/or both TFBS and SS in the same sequence. Four groups were derived from these results: TFBS, TFBS-SS, SS, and not identified.

The number of TFBS and SS in non-conserved sequences from the same gene intronic regions as the conserved sequences were used as controls. These sequences were scanned into the CIS-BP Database for TFBS hits and with HSF for splicing signals, with the same parameters as for the conserved sequences analysis. Moreover, a classification of TFBD families for TFBS in non-conserved sequences was used as a control. To randomly obtain the non-conserved sequences (n = 1,044), the BCSDm program was modified to extract nucleotides from the PSSM that were less than 100% conserved while maintaining a threshold of ≥15 consecutive nucleotides to generate a sequence.

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.