The strength of machine learning is its ability to find previously unrecognized phenotypes, as well as to “rediscover” previously known phenotypes, simultaneously moving research beyond published literature and clinical studies. Therefore, the previous studies can serve as a positive control to determine whether machine learning is able to accurately identify the known phenotypes and how well it is able to actively expand the boundary of our knowledge. To verify our results, we used a text-mining tool called KinderMiner, which enabled us to screen the entire published literature on fragile X premutation available on PubMed and evaluate our list of phecodes (34). KinderMiner uses keyword matching and document counting to identify correlations of FMR1 premutation and target clinical phenotypes and ranks them by their co-occurrence proportion. For each target phenotype, KinderMiner returns the number of articles that contain both, either, and neither the target phenotype and FMR1 premutation. The one-sided Fisher’s exact test was performed to identify the significance level of each correlation. We processed 26 million article abstracts and identified 2070 published articles related to FMR1 premutation. Table S1 shows the list of target phenotypes and their association level with FMR1 premutation in the published literature.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.