Machine learning approaches have traditionally suffered from a lack of transparency which includes an inability to understand how a model “learns the data” and subsequently arrives at a decision. The moniker of “black-box model” reflects this palpable loss of interpretability of machine learning models when compared with traditional statistical models. However, recent methodological advancements have begun to address this limitation and provide a means by which machine learning model predictions can be explained across features at both the global (entire dataset) and local (each data point) level. SHAP (SHapley Additive exPlanations) (Lundberg and Lee, 2017) is one such method based upon the Shapley values of game theory (Shapley, 1953). Where Shapley values are conceptualized as relative payouts to players in a cooperative game based on their relative contribution, SHAP equates players as feature values in a prediction task game. As such, SHAP aims to explain the prediction outcome of each sample in the dataset through calculation of each feature's (e.g., Instagram likes) contribution to that prediction. The resulting values are thus interpreted as the relative magnitudes by which features influence prediction outcomes. The SHAP framework is particularly attractive for this analytical pipeline because it is model agnostic and thus applicable across all model types (e.g., linear, tree-based). The iBreakdown package in R was used to predict SHAP values for all features across each individual in the dataset. SHAP values were visualized for the consensus ensemble machine learning model using the data structures provided in the SHAPforxgboost R package.
For added clarity, Fig. 1, Fig. 2 outline the analytical pipeline described above.
Analytical pipeline of baseline comparison model.
Analytical pipeline of consensus ensemble model.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.