Topic modeling using LDA and performance evaluation of classification algorithm: k-NN, SVM, NBC, and DT
DOI:
https://doi.org/10.35335/cit.Vol16.2024.846.pp143-157Keywords:
Topic Modeling, Latent Dirichlet Allocation (LDA), Classification Algorithms, Machine Learning, Data Analysis FrameworkAbstract
This research investigates the integration of Latent Dirichlet Allocation (LDA) for topic modeling with the performance evaluation of various classification algorithms—specifically, k-nearest Neighbors (k-NN), Support Vector Machines (SVM), Naive Bayes Classifier (NBC), and Decision Trees (DT)—within the Digital Content Reviews and Analysis Framework. The framework systematically processes and analyzes digital content, including data cleaning, extraction, evaluation, and visualization techniques, to enhance machine learning models' interpretability and predictive accuracy. The study demonstrates that combining LDA with these classification algorithms significantly improves data interpretation and model performance, particularly in handling large-scale textual datasets. Notably, the Decision Tree algorithm achieved a 98.86% accuracy post-SMOTE. At the same time, the Support Vector Machine reached a near-perfect AUC of 1.000, highlighting the efficacy of these methods in managing imbalanced datasets. The findings provide valuable insights for optimizing model selection and developing more robust and adaptive machine-learning models across various applications. This research contributes to advancing the field of artificial intelligence by proposing a comprehensive framework that effectively addresses complex data-driven challenges, encouraging further exploration of more flexible and scalable models to accommodate evolving data environments.
Downloads
References
Y. Feng, H. Chen, and Q. Xie, “AI Influencers in Advertising: The Role of AI Influencer-Related Attributes in Shaping Consumer Attitudes, Consumer Trust, and Perceived Influencer–Product Fit,” J. Interact. Advert., vol. 24, no. 1, pp. 26–47, 2023, doi: 10.1080/15252019.2023.2284355.
D. Madrid-Morales, “Using Computational Text Analysis Tools to Study African Online News Content,” African Journal. Stud., vol. 41, no. 4, pp. 68–82, 2020, doi: 10.1080/23743670.2020.1820885.
H. N. T. Thu, “Measuring guest satisfaction from online reviews: Envidence in Vietnam,” Cogent Soc. Sci., vol. 6, no. 1, pp. 1–14, 2020, doi: 10.1080/23311886.2020.1801117.
S. Gao, H. Wang, Y. Zhu, J. Liu, and O. Tang, “Comparative relation mining of customer reviews based on a hybrid CSR method,” 2023, doi: 10.1080/09540091.2023.2251717.
R. X. Nie, J. H. Hu, H. Y. Zhang, J. Q. Wang, K. S. Chin, and X. Bao, “Classifying Quality Attributes of Hotel Services Considering Review Characteristics and Semantic Consistency: A Review-Driven IPA,” J. Qual. Assur. Hosp. Tour., vol. 00, no. 00, pp. 1–30, 2023, doi: 10.1080/1528008X.2023.2259610.
Y. Feng, H. Chen, and Q. Kong, “An expert with whom i can identify: the role of narratives in influencer marketing,” Int. J. Advert., vol. 40, no. 7, pp. 972–993, 2021, doi: 10.1080/02650487.2020.1824751.
M. Brüggemann, J. Kunert, and L. Sprengelmeyer, “Framing Food in the News: Still Keeping the Politics out of the Broccoli,” Journal. Pract., pp. 1–23, 2022, doi: 10.1080/17512786.2022.2153074.
F. Otay Demir, ?. Yavuz Görkem, and G. Rafferty, “An inquiry on the potential of computational literary techniques towards successful destination branding and literary tourism,” Curr. Issues Tour., vol. 25, no. 5, pp. 764–778, 2022, doi: 10.1080/13683500.2021.1887100.
Z. Kastrati, A. S. Imran, S. M. Daudpota, M. A. Memon, and M. Kastrati, “Soaring Energy Prices: Understanding Public Engagement on Twitter Using Sentiment Analysis and Topic Modeling with Transformers,” IEEE Access, vol. 11, no. February, pp. 26541–26553, 2023, doi: 10.1109/ACCESS.2023.3257283.
T. G. Thorley and E. Saltman, “GIFCT Tech Trials: Combining Behavioural Signals to Surface Terrorist and Violent Extremist Content Online,” Stud. Confl. Terror., vol. 0, no. 0, pp. 1–26, 2023, doi: 10.1080/1057610X.2023.2222901.
M. P. Mehta, G. Kumar, and M. Ramkumar, “Customer expectations in the hotel industry during the COVID-19 pandemic: a global perspective using sentiment analysis,” Tour. Recreat. Res., vol. 48, no. 1, pp. 110–127, 2023, doi: 10.1080/02508281.2021.1894692.
Q. Yang, B. Zhu, H. Liao, and X. Wu, “Learning consumer preferences from online textual reviews and ratings based on the aggregation-disaggregation paradigm with attitudinal Choquet integral,” Econ. Res. Istraz. , vol. 36, no. 1, pp. 3059–3086, 2023, doi: 10.1080/1331677X.2022.2106282.
S. Zhou, X. Ye, J. Yang, and R. Sun, “Current Issues in Tourism From turbulence to recovery?: tracking the cognition-sentiment-behaviour transformation among Chinese cruise industry stakeholders,” Curr. Issues Tour., pp. 1–21, 2024, doi: 10.1080/13683500.2024.2329778.
J. Z. Maitama, N. Idris, A. Abdi, L. Shuib, and R. Fauzi, “A systematic review on implicit and explicit aspect extraction in sentiment analysis,” IEEE Access, vol. 8, pp. 194166–194191, 2020, doi: 10.1109/ACCESS.2020.3031217.
T. Falatouri, P. Brandtner, M. Nasseri, and F. Darbanian, “Service quality dimensions in Austrian food retailing – a text mining approach for physical retail stores,” Int. Rev. Retail. Distrib. Consum. Res., vol. 00, no. 00, pp. 1–36, 2024, doi: 10.1080/09593969.2024.2371456.
J. Wu and N. Zhao, “What consumer complaints should hoteliers prioritize? Analysis of online reviews under different market segments,” J. Hosp. Mark. Manag., vol. 32, no. 1, pp. 1–28, 2023, doi: 10.1080/19368623.2022.2119187.
T. Ginossar, I. J. Cruickshank, E. Zheleva, J. Sulskis, and T. Berger-Wolf, “Cross-platform spread: vaccine-related content, sources, and conspiracy theories in YouTube videos shared in early Twitter COVID-19 conversations,” Hum. Vaccines Immunother., vol. 18, no. 1, pp. 1–13, 2022, doi: 10.1080/21645515.2021.2003647.
S. M. Al-Ghuribi, S. A. Mohd Noah, and S. Tiun, “Unsupervised Semantic Approach of Aspect-Based Sentiment Analysis for Large-Scale User Reviews,” IEEE Access, vol. 8, pp. 218592–218613, 2020, doi: 10.1109/ACCESS.2020.3042312.
K. Rauniyar et al., “Multi-Aspect Annotation and Analysis of Nepali Tweets on Anti-Establishment Election Discourse,” IEEE Access, vol. 11, no. November, pp. 143092–143115, 2023, doi: 10.1109/ACCESS.2023.3342154.
H. N. T. Thu, “Measuring guest satisfaction from online reviews: Envidence in Vietnam,” Cogent Soc. Sci., vol. 6, no. 1, 2020, doi: 10.1080/23311886.2020.1801117.
J. Khan, A. Alam, and Y. Lee, “Intelligent Hybrid Feature Selection for Textual Sentiment Classification,” IEEE Access, vol. 9, pp. 140590–140608, 2021, doi: 10.1109/ACCESS.2021.3118982.
D. Buenano-Fernandez, M. Gonzalez, D. Gil, and S. Lujan-Mora, “Text Mining of Open-Ended Questions in Self-Assessment of University Teachers: An LDA Topic Modeling Approach,” IEEE Access, vol. 8, pp. 35318–35330, 2020, doi: 10.1109/ACCESS.2020.2974983.
M. Zheng, K. Jiang, R. Xu, and L. Qi, “An Adaptive LDA Optimal Topic Number Selection Method in News Topic Identification,” IEEE Access, vol. 11, pp. 92273–92284, 2023, doi: 10.1109/ACCESS.2023.3308520.
P. Yang, Y. Yao, and H. Zhou, “Leveraging Global and Local Topic Popularities for LDA-Based Document Clustering,” IEEE Access, vol. 8, pp. 24734–24745, 2020, doi: 10.1109/ACCESS.2020.2969525.
Y. Zheng, Y. Long, and H. Fan, “Identifying Labor Market Competitors with Machine Learning Based on Maimai Platform,” Appl. Artif. Intell., vol. 36, no. 1, 2022, doi: 10.1080/08839514.2022.2064047.
Z. Wang, P. Udomwong, J. Fu, and P. Onpium, “Destination image analysis and marketing strategies in emerging panda tourism: a cross-cultural perspective,” Cogent Bus. Manag., vol. 11, no. 1, p., 2024, doi: 10.1080/23311975.2024.2364837.
M. Rodriguez-Ibanez, F. J. Gimeno-Blanes, P. M. Cuenca-Jimenez, C. Soguero-Ruiz, and J. L. Rojo-Alvarez, “Sentiment Analysis of Political Tweets from the 2019 Spanish Elections,” IEEE Access, vol. 9, pp. 101847–101862, 2021, doi: 10.1109/ACCESS.2021.3097492.
S. Moral-Garcia and J. Abellan, “Improving the Results in Credit Scoring by Increasing Diversity in Ensembles of Classifiers,” IEEE Access, vol. 11, no. May, pp. 58451–58461, 2023, doi: 10.1109/ACCESS.2023.3284137.
C. Kaveski Peres and E. Pacheco Paladini, “Exploring the attributes of hotel service quality in Florianópolis-SC, Brazil: An analysis of tripAdvisor reviews,” Cogent Bus. Manag., vol. 8, no. 1, pp. 1–19, 2021, doi: 10.1080/23311975.2021.1926211.
S. M. A. H. Shah, S. F. H. Shah, A. Ullah, A. Rizwan, G. Atteia, and M. Alabdulhafith, “Arabic Sentiment Analysis and Sarcasm Detection Using Probabilistic Projections-Based Variational Switch Transformer,” IEEE Access, vol. 11, no. June, pp. 67865–67881, 2023, doi: 10.1109/ACCESS.2023.3289715.
Y. A. Singgalen, “Analisis Sentimen Pengunjung Pulau Komodo dan Pulau Rinca di Website Tripadvisor Berbasis CRISP-DM,” J. Inf. Syst. Res., vol. 4, no. 2, pp. 614–625, 2023, doi: 10.47065/josh.v4i2.2999.
Y. A. Singgalen, “Sentiment Classification of Over-Tourism Issues in Responsible Tourism Content using Naïve Bayes Classifier,” J. Comput. Syst. Informatics, vol. 5, no. 2, pp. 275–285, 2024, doi: 10.47065/josyc.v5i2.4904.
S. A. Azzahra and A. Wibowo, “Analisis Sentimen Multi-Aspek Berbasis Konversi Ikon Emosi dengan Algoritme Naïve Bayes untuk Ulasan Wisata Kuliner Pada Web Tripadvisor,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 4, pp. 737–743, 2020, doi: 10.25126/jtiik.2020731907.
M. S. Rahman and H. Reza, “A Systematic Review Towards Big Data Analytics in Social Media,” Big Data Min. Anal., vol. 5, no. 3, pp. 228–244, 2022, doi: 10.26599/BDMA.2022.9020009.
Y. Du, Y. Liu, Y. Yan, J. Fang, and X. Jiang, “Risk Management of Weather-Related Failures in Distribution Systems Based on Interpretable Extra-Trees,” J. Mod. Power Syst. Clean Energy, vol. 11, no. 6, pp. 1868–1877, 2023, doi: 10.35833/MPCE.2022.000430.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Yerik Afrianto Singgalen

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

