Topic modeling is a widely used text-mining method that aims to uncover latent semantic structures in document sets by identifying potential topics. 25 It can statistically capture those topics with the use of different algorithms, 22 such as Principal Components Analysis (PCA) and Latent Dirichlet Allocation (LDA). LDA is a popular algorithm in natural language processing (NLP) and was employed in this study. Firstly, a Python Kit was used to parse the reviews, followed by the exclusion of meaningless words (eg, “I” and “we”) and high-frequency words (eg, “doctor” and “patient”) from the texts. Using LDA, we selected the optimal number of topics in the corpus based on its perplexity evaluation criteria, which measures the quality of the model. We experimented with different topic numbers, ranging from 2 to 10, running the LDA model 10 times in each iteration through Anaconda 3. After evaluating perplexity statistics, we found that the minimum perplexity occurred for 3 topics.
Data management was performed using Microsoft Excel 2017, while SPSS software version 22.0 was used for all statistical analyses. Given the exploratory nature of our study, stepwise regression was deemed suitable as it can screen for significant independent variables affecting the dependent variable and simplify the regression equation. Specifically, independent variables were only introduced if their partial regression sum-of-squares were significant. Any independent variables deemed to have little influence on the dependent variable were eliminated to identify the optimal regression subset. Our models were constructed hierarchically, with control variables included in model 1, followed by independent variables in model 2, and interaction terms in model 3. All reported P-values were 2-sided, and a p-value of less than .05 was considered statistically significant. The regression equation was expressed as follows in equation (1), where β0 represents the constant term, β1 through β20 represent the regression coefficients, and ε represents the error term. The term Topic_it-1×Spe_dummyit-1 (i = 1, 2, 3) reflects the moderating effect of the specialty.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.