2.6.1. Graph Representation of Molecules

Hong-Yi Zhi; Lu Zhao; Cheng-Chun Lee; Calvin Yu-Chian Chen

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.6.1. Graph Representation of Molecules

HZ Hong-Yi Zhi

LZ Lu Zhao

CL Cheng-Chun Lee

CC Calvin Yu-Chian Chen

This method is extracted from research article: Biomolecules, Mar 2021

A Novel Graph Neural Network Methodology to Investigate Dihydroorotate Dehydrogenase Inhibitors in Small Cell Lung Cancer

DOI: 10.3390/biom11030477

Request a Protocol

Ask a question

Favorite

Molecular graphs first needed to be transformed to a suitable input for GNN so that the model could availably extract a spatial feature for learning. Specifically, the graph structure of each molecule $G$ was denoted with an edge connection matrix $A$ and a node feature matrix $X$ . The edge connection matrix $A \in R^{2 \times n}$ , where $n$ denotes the number of edges in a molecule, represented the connection information between atoms in coordinate (COO) format. For example, $A_{i}$ for $i \in n$ indicated that there was an edge connection between two nodes, which were represented as $A_{1 i}$ and $A_{2 i}$ in $i$ -th column, respectively.

The node feature matrix $X \in R^{n \times m}$ , where $n$ denotes the number of nodes and $m$ denotes the number of node features, represented the information of each node feature. The features include atom symbol, degree, hybridization, valence, formal charge, atom in ring of size, aromatic, and explicit hydrogen, which are introduced in Table 1. We used one-hot encoding for most of these features, except for aromatic, which was encoded as integers. After one-hot encoding, all categories of each feature were listed and sorted, and marked as either 0 or 1 by atomic category (Figure 5). For example, atom symbol was encoded as a vector of 12 bits, and degree was encoded as a vector of 7 bits. If the atom was a carbon atom and the number of its covalent bonds was 2, the first site of the atom symbol vector and the third site of the degree vector were marked as 1; the other sites in both vectors were marked as 0.

The construction of the molecules’ graph representation and initial feature matrix of the molecules. Atoms are coded to indicate the feature vector corresponding to atoms.

Description of atom features.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol