The Machine learning approach: the self-organizing map (SOM)

You-Jia Chen; Emily Nicholson; Su-Ting Cheng

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

The Machine learning approach: the self-organizing map (SOM)

YC You-Jia Chen

EN Emily Nicholson

SC Su-Ting Cheng

This method is extracted from research article: Sci Rep, Oct 2020

Using machine learning to understand the implications of meteorological conditions for fish kills

DOI: 10.1038/s41598-020-73922-3

Request a Protocol

Ask a question

Favorite

The self-organizing map (SOM) is a type of artificial neural networks, usually used as a tool for clustering or data-mining^{⁴⁵,⁴⁶}. Its unsupervised character makes it useful in providing automatically and unbiased clustering results, by applying the “shortest relation distance” algorithm between every input variable to decide the weight vector through learning about the input data^{⁴⁶,⁴⁷}. As the SOM can effectively reduce high data dimensions into a 2-dimensional map for clustering and visualizing, it has been widely used to explore problems in industry, natural sciences, ecology, and many other fields^{⁴⁸–⁵⁰}.

During the SOM learning and training process, we inspected the consistency of the results to judge if convergence was reached. Evaluation was done by calculating the similarity of the SOM using the simple matching coefficient (SMC), in which a neighborhood matrix is created with both the number of rows and columns being equal to the number of data^⁵¹, and each row or column is used to represent each data vector. In this neighborhood matrix, if two data points are assigned to the same neuron or the adjacent neuron in the SOM, the corresponding value in the matrix is 1, otherwise the value is 0. If the corresponding position of the two matrices is 1, it is regarded as positive similarity, whereas 0 is regarded as negative similarity. In the end, SMC is calculated by dividing number of matches (positive similarity and negative similarity) by the total number of elements in the matrix^⁵¹:

To determine the optimal output neuron numbers of the SOM, we trained the SOM with different map sizes, including 2×2, 3×2, 3×3, …, 5×5, and applied the criteria of quantization error (QE)^⁵² and topographic error (TE)^⁴⁹. In particular, we calculated the associated QE as the average distance between input vector and the weight vector of its best-matching unit (BMU)^⁴⁹:

where x_i is the input vector, u_c is the vector of the BMU, and n is the number of data vectors. We considered the number of input vectors that its second-matching unit (SMU) is not adjacent to the BMU as the error of TE^⁴⁹:

where u (x_i) is set to 1 if the SMU is not adjacent to the BMU.

Moreover, since QE decreases when output neuron numbers increase, we determined the optimal solution as the local minimum of TE^⁴⁹, and took the shape of the SOM map into consideration for easier visualization purposes. As a result, the square shaped map (i.e., same neuron numbers in length and width) was preferred since it retained patterns among input variables whichever the SOM map was rotated.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol