Segmentation

Zufeng Zhong

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Segmentation

ZZ Zufeng Zhong

This method is extracted from research article: Disaster Med Public Health Prep, Aug 2020

Internet Public Opinion Evolution in the COVID-19 Event and Coping Strategies

DOI: 10.1017/dmp.2020.299

Request a Protocol

Ask a question

Favorite

The sorted texts were subject to wrongly written character correction, emoticon elimination, and the removal of terms with no specific meaning. The contents were segmented using the cut function in the jieba library of Python, to provide the basis for subsequent research on topic modeling and sentiment analysis. The Python jieba segmentation kit is widely recognized as a useful word segmentation tool in Chinese text preprocessing (https://pypi.org/project/jieba/). Based on the highly efficient word-graph screening function in the Trie structure, jieba word segmentation is capable of generating sentences where all the Chinese characters are involved in a directed acyclic graph. It also checks the maximum-probability path and word frequency-based maximum segmentation combination through dynamic planning.

This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol