ask Ask a question
Favorite

First, texts from coursebooks were digitized and annotated with meta-attributes manually. Before extracting text features for feature-based models, we cleaned the texts of noisy symbols and non-standard punctuation (for example, we replaced “?.” with “?”). Before extracting some features, such as coverage by different word lists, we also lemmatized the texts with the Mystem 3.1 toolkit for Python (Segalovich, 2003). Sentence tokenization was performed with ru_punkt2, an NLTK sentence tokenizer for Russian.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A