Transparency Analysis of Deep Learning Models in Medical Data Using SHAP and LIME

Authors

  • Arka Evander 1,2 Department of Computer Science, University of Luxembourg. Luxembourg
  • Lyra Amara Quinn Department of Computer Science, University of Luxembourg. Luxembourg

Keywords:

Deep Learning, Explainable AI, SHAP, LIME, Medical Data

Abstract

The increasing adoption of deep learning models in healthcare has significantly improved the accuracy of medical diagnosis and prediction; however, their lack of transparency remains a critical challenge. These models often operate as “black boxes,” making it difficult for healthcare professionals to understand the reasoning behind their predictions, which raises concerns regarding trust, safety, and ethical decision-making. This study aims to analyze the transparency of deep learning models applied to medical data by utilizing two widely used explainable artificial intelligence (XAI) techniques, namely SHAP (SHapley Additive Explanations) and LIME (Local Interpretable Model-Agnostic Explanations). A deep learning model was developed using medical datasets, including clinical (tabular) and/or medical imaging data, and evaluated using performance metrics such as accuracy, precision, recall, F1-score, and Area Under the Curve (AUC). To enhance interpretability, SHAP and LIME were applied to explain the model’s predictions at both global and local levels. The results indicate that the model achieves high predictive performance, with key features such as glucose level, age, blood pressure, and cholesterol significantly influencing predictions. The comparative analysis shows that SHAP provides more consistent, stable, and comprehensive explanations, making it more suitable for global interpretation and clinical decision support. In contrast, LIME offers simpler and more intuitive local explanations, which are useful for understanding individual predictions but may lack stability across samples. This study contributes to the advancement of explainable AI in healthcare by demonstrating how interpretability techniques can bridge the gap between high model performance and practical clinical applicability. Future research is recommended to explore more robust and scalable XAI approaches for real-world medical applications.

Downloads

Download data is not yet available.

References

Abdullah, T. A. A., Zahid, M. S. M., & Ali, W. (2021). A review of interpretable ML in healthcare: taxonomy, applications, challenges, and future directions. Symmetry, 13(12), 2439.

Ahmed, Z., Mohamed, K., Zeeshan, S., & Dong, X. (2020). Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine. Database, 2020, baaa010.

Alahmari, S. S., Goldgof, D. B., Mouton, P. R., & Hall, L. O. (2020). Challenges for the repeatability of deep learning models. IEEE Access, 8, 211860–211868.

Aljuaid, T., & Sasi, S. (2016). Proper imputation techniques for missing values in data sets. 2016 International Conference on Data Science and Engineering (ICDSE), 1–5.

Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1), 6.

Dieber, J., & Kirrane, S. (2020). Why model why? Assessing the strengths and limitations of LIME. ArXiv Preprint ArXiv:2012.00093.

Dinov, I. D. (2016). Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data. Gigascience, 5(1), s13742-016.

Diprose, W. K., Buist, N., Hua, N., Thurier, Q., Shand, G., & Robinson, R. (2020). Physician understanding, explainability, and trust in a hypothetical machine learning risk calculator. Journal of the American Medical Informatics Association, 27(4), 592–600.

Kalusivalingam, A. K., Sharma, A., Patel, N., & Singh, V. (2021). Leveraging SHAP and LIME for enhanced explainability in AI-driven diagnostic systems. International Journal of AI and ML, 2(3).

Lakkaraju, H., Arsov, N., & Bastani, O. (2020). Robust and stable black box explanations. International Conference on Machine Learning, 5628–5638.

Li, Z., Kamnitsas, K., & Glocker, B. (2020). Analyzing overfitting under class imbalance in neural networks for image segmentation. IEEE Transactions on Medical Imaging, 40(3), 1065–1077.

Linardatos, P., Papastefanopoulos, V., & Kotsiantis, S. (2020). Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1), 18.

Marcílio, W. E., & Eler, D. M. (2020). From explanations to feature selection: assessing SHAP values as feature selection mechanism. 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), 340–347.

McCradden, M. D., Joshi, S., Anderson, J. A., Mazwi, M., Goldenberg, A., & Zlotnik Shaul, R. (2020). Patient safety and quality improvement: Ethical principles for a regulatory approach to bias in healthcare machine learning. Journal of the American Medical Informatics Association, 27(12), 2024–2027.

Navas-Nacher, E. L., Colangelo, L., Beam, C., & Greenland, P. (2001). Risk factors for coronary heart disease in men 18 to 39 years of age. Annals of Internal Medicine, 134(6), 433–439.

Oakden-Rayner, L. (2020). Exploring large-scale public medical image datasets. Academic Radiology, 27(1), 106–112.

Pavlou, M., Ambler, G., Seaman, S. R., Guttmann, O., Elliott, P., King, M., & Omar, R. Z. (2015). How to develop a more accurate risk prediction model when there are few events. Bmj, 351.

Shafi, I., Ahmad, J., Shah, S. I., & Kashif, F. M. (2006). Impact of varying neurons and hidden layers in neural network architecture for a time frequency application. 2006 IEEE International Multitopic Conference, 188–193.

Subasi, A. (2020). Practical machine learning for data analysis using python. Academic Press.

Sun, X., Liu, Y., Li, J., Zhu, J., Liu, X., & Chen, H. (2012). Using cooperative game theory to optimize the feature selection problem. Neurocomputing, 97, 86–93.

Uçar, M. K., Nour, M., Sindi, H., & Polat, K. (2020). The effect of training and testing process on machine learning in biomedical datasets. Mathematical Problems in Engineering, 2020(1), 2836236.

Ye, Y., Xiong, Y., Zhou, Q., Wu, J., Li, X., & Xiao, X. (2020). Comparison of machine learning methods and conventional logistic regressions for predicting gestational diabetes using routine clinical data: a retrospective cohort study. Journal of Diabetes Research, 2020(1), 4168340.

Zhao, X., Huang, W., Huang, X., Robu, V., & Flynn, D. (2021). Baylime: Bayesian local interpretable model-agnostic explanations. Uncertainty in Artificial Intelligence, 887–896.

Zhou, Z., Hooker, G., & Wang, F. (2021). S-lime: Stabilized-lime for model explanation. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2429–2438.

Downloads

Published

2026-01-30

How to Cite

Evander, A., & Quinn, L. A. (2026). Transparency Analysis of Deep Learning Models in Medical Data Using SHAP and LIME. Jurnal Teknik Informatika C.I.T Medicom, 17(6), 304–315. Retrieved from https://www.medikom.iocspublisher.org/index.php/JTI/article/view/1513

Issue

Section

OPTIMIZATION AND ARTIFICIAL INTELLIGENCE