The sorted texts were subject to wrongly written character correction, emoticon elimination, and the removal of terms with no specific meaning. The contents were segmented using the cut function in the jieba library of Python, to provide the basis for subsequent research on topic modeling and sentiment analysis. The Python jieba segmentation kit is widely recognized as a useful word segmentation tool in Chinese text preprocessing (https://pypi.org/project/jieba/). Based on the highly efficient word-graph screening function in the Trie structure, jieba word segmentation is capable of generating sentences where all the Chinese characters are involved in a directed acyclic graph. It also checks the maximum-probability path and word frequency-based maximum segmentation combination through dynamic planning.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.