The Machine learning approach: the self-organizing map (SOM)

YC You-Jia Chen
EN Emily Nicholson
SC Su-Ting Cheng
request Request a Protocol
ask Ask a question
Favorite

The self-organizing map (SOM) is a type of artificial neural networks, usually used as a tool for clustering or data-mining45,46. Its unsupervised character makes it useful in providing automatically and unbiased clustering results, by applying the “shortest relation distance” algorithm between every input variable to decide the weight vector through learning about the input data46,47. As the SOM can effectively reduce high data dimensions into a 2-dimensional map for clustering and visualizing, it has been widely used to explore problems in industry, natural sciences, ecology, and many other fields4850.

During the SOM learning and training process, we inspected the consistency of the results to judge if convergence was reached. Evaluation was done by calculating the similarity of the SOM using the simple matching coefficient (SMC), in which a neighborhood matrix is created with both the number of rows and columns being equal to the number of data51, and each row or column is used to represent each data vector. In this neighborhood matrix, if two data points are assigned to the same neuron or the adjacent neuron in the SOM, the corresponding value in the matrix is 1, otherwise the value is 0. If the corresponding position of the two matrices is 1, it is regarded as positive similarity, whereas 0 is regarded as negative similarity. In the end, SMC is calculated by dividing number of matches (positive similarity and negative similarity) by the total number of elements in the matrix51:

To determine the optimal output neuron numbers of the SOM, we trained the SOM with different map sizes, including 2×2, 3×2, 3×3, …, 5×5, and applied the criteria of quantization error (QE)52 and topographic error (TE)49. In particular, we calculated the associated QE as the average distance between input vector and the weight vector of its best-matching unit (BMU)49:

where xi is the input vector, uc is the vector of the BMU, and n is the number of data vectors. We considered the number of input vectors that its second-matching unit (SMU) is not adjacent to the BMU as the error of TE49:

where u (xi) is set to 1 if the SMU is not adjacent to the BMU.

Moreover, since QE decreases when output neuron numbers increase, we determined the optimal solution as the local minimum of TE49, and took the shape of the SOM map into consideration for easier visualization purposes. As a result, the square shaped map (i.e., same neuron numbers in length and width) was preferred since it retained patterns among input variables whichever the SOM map was rotated.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A