The Gaussian mixture model is a probabilistic model which uses a finite number of Gaussian distributions to fit data points and get the probability density of them [25]. Here, the Gaussian mixture model with only one component was found to perform best in predicting virus hosts (Additional file 1: Figure S2); therefore, the Gaussian mixture model was simplified as Gaussian model (GM). The GM takes the differences of k-mer frequencies between virus and prokaryotic genomic sequences as features, and outputs a score (the logarithm of the probability of being viral host) for the prokaryote. The k-mers of 4 nucleotides were selected (Additional file 1: Figure S2), which resulted in 256 features. The GM was built using the function of GaussianMixture in scikit-learn [25, 26].

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.