Generating context-based features

Sara Althubaiti; Şenay Kafkas; Marwa Abdelhakim; Robert Hoehndorf

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Generating context-based features

SA Sara Althubaiti

ŞK Şenay Kafkas

MA Marwa Abdelhakim

RH Robert Hoehndorf

This method is extracted from research article: J Biomed Semantics, Jan 2020

Combining lexical and context features for automatic ontology extension

DOI: 10.1186/s13326-019-0218-0

Ask a question

Favorite

We use Word2Vec [22] to generate word embedding. Specifically, we use a skip-gram model which aims to find word representations that are useful for predicting the surrounding words in a given sentence or a document consisting of sequence of words; w₁,w₂,...,w_K. The objective is to maximize the average log probability using the following formula:

where word vectors V(w) are computed by averaging over the number of words K and c is the size of the training context. We generated the word embedding by using the default parameter settings of the Word2Vec gensim implementation: vector size (dimensionality) of 100, window size 5, minimum occurrence count of 5, and we use a skip-gram (sg) model.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol