Molecular graphs first needed to be transformed to a suitable input for GNN so that the model could availably extract a spatial feature for learning. Specifically, the graph structure of each molecule was denoted with an edge connection matrix and a node feature matrix . The edge connection matrix , where denotes the number of edges in a molecule, represented the connection information between atoms in coordinate (COO) format. For example, for indicated that there was an edge connection between two nodes, which were represented as and in -th column, respectively.
The node feature matrix , where denotes the number of nodes and denotes the number of node features, represented the information of each node feature. The features include atom symbol, degree, hybridization, valence, formal charge, atom in ring of size, aromatic, and explicit hydrogen, which are introduced in Table 1. We used one-hot encoding for most of these features, except for aromatic, which was encoded as integers. After one-hot encoding, all categories of each feature were listed and sorted, and marked as either 0 or 1 by atomic category (Figure 5). For example, atom symbol was encoded as a vector of 12 bits, and degree was encoded as a vector of 7 bits. If the atom was a carbon atom and the number of its covalent bonds was 2, the first site of the atom symbol vector and the third site of the degree vector were marked as 1; the other sites in both vectors were marked as 0.
The construction of the molecules’ graph representation and initial feature matrix of the molecules. Atoms are coded to indicate the feature vector corresponding to atoms.
Description of atom features.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.