Analisis Sentimen Air Mineral dan Demineral Menggunakan SVM pada Dataset Tidak Seimbang
DOI:
https://doi.org/10.47134/jacis.v5i2.138Keywords:
sentiment analysis, mineral water, support vector, class imbalance, small datasetAbstract
Penelitian ini menganalisis sentimen opini pengguna platform X terhadap air mineral dan air demineral menggunakan pendekatan kuantitatif eksperimental. Data set terdiri dari 140 tweet yang dilabeli manual ke dalam tiga kategori: positif (38.57%), negatif (13.57%), dan netral (47.86%). Ketidakseimbangan kelas ekstrem dan ukuran data set yang sangat kecil menjadi tantangan utama yang mendasari novelty penelitian ini. Penelitian ini menguji hipotesis bahwa class weighting lebih efektif daripada Synthetic Minority Over-sampling Technique (SMOTE) pada dataset kecil. Klasifikasi menggunakan Support Vector Machine (SVM) dengan pra-pemrosesan teks (case folding, data cleaning, stemming Sastrawi, stopword removal) dan ekstraksi fitur Term Frequency-Inverse Document Frequency (TF-IDF). Evaluasi melalui 10-fold stratified cross-validation menunjukkan SVM dengan class weighting mencapai akurasi 57.14% dan F1-score macro 42.36%, sedikit mengungguli SMOTE (53.57% akurasi, 40.98% F1 macro). Uji statistik (Cohen's d = 0.125) mengkonfirmasi tidak ada perbedaan signifikan, membuktikan SMOTE tidak efektif pada dataset sangat kecil. Kelas minoritas (negatif) mencapai F1-score hanya 8%, menunjukkan tantangan ketidakseimbangan ekstrem. Kontribusi utama penelitian ini adalah validasi empiris bahwa class weighting lebih robust daripada SMOTE untuk analisis sentimen berbahasa Indonesia pada kondisi data terbatas
References
[1] R. Salim and T. Taslim, “Edukasi manfaat air mineral pada tubuh bagi anak sekolah dasar secara online,” J. Pengabdi. Kpd. Masy., vol. 27, no. 2, pp. 126–135, 2021.
[2] F. Kozisek, “Health risks from drinking demineralised water,” in Nutrients in Drinking Water, vol. 1, 2005, pp. 148–163.
[3] D. Sunardi et al., “Health effects of alkaline, oxygenated, and demineralized water compared to mineral water among healthy population: A systematic review,” Rev. Environ. Health, vol. 39, no. 2, pp. 339–349, 2024. DOI: https://doi.org/10.1515/reveh-2022-0057
[4] D. Briawan, T. R. Sedayu, and I. Ekayanti, “Kebiasaan minum dan asupan cairan remaja di perkotaan,” J. Gizi Klin. Indones., vol. 8, no. 1, pp. 36–41, 2011. DOI: https://doi.org/10.22146/ijcn.17729
[5] Z. Maharani, A. Luthfiarta, and N. Farsya, “Sentiment Analysis of the 2024 Indonesian Presidential Dispute Trial Election using SVM and Naïve Bayes on Platform X,” Build. Informatics, Technol. Sci., vol. 6, no. 1, pp. 440–449, 2024, doi: 10.47065/bits.v6i1.5380. DOI: https://doi.org/10.47065/bits.v6i1.5380
[6] M. Q. H. Octava, D. G. P. Putri, F. M. Hilmy, U. Farooq, R. A. Nurhaliza, and G. Alfian, “Web-based sentiment analysis system using SVM and TF-IDF with statistical feature,” in 2023 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), 2023, pp. 466–471.
[7] N. Z. B. Jannah and K. Kusnawi, “Comparison of Naïve Bayes and SVM in Sentiment Analysis of Product Reviews on Marketplaces,” Sinkron, vol. 8, no. 2, pp. 727–733, 2024, doi: 10.33395/sinkron.v8i2.13559. DOI: https://doi.org/10.33395/sinkron.v8i2.13559
[8] J. M. Johnson and T. M. Khoshgoftaar, “The class imbalance problem in deep learning,” Mach. Learn., vol. 111, pp. 4845–4888, 2022, doi: 10.1007/s10994-022-06268-8. DOI: https://doi.org/10.1007/s10994-022-06268-8
[9] B. Krawczyk, “Learning from imbalanced data: open challenges and future directions,” Prog. Artif. Intell., vol. 5, no. 4, pp. 221–232, 2016, doi: 10.1007/s13748-016-0094-0. DOI: https://doi.org/10.1007/s13748-016-0094-0
[10] J. Mantik and Y. A. Singgalen, “Performance evaluation of SVM with synthetic minority over-sampling technique in sentiment classification,” Mantik J., vol. 8, no. 1, pp. 1–10, 2024. DOI: https://doi.org/10.35335/mantik.v8i1.5077
[11] A. W. Pradana and M. Hayaty, “The effect of stemming and removal of stopwords on the accuracy of sentiment analysis on Indonesian-language texts,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 4, pp. 375–380, 2019. DOI: https://doi.org/10.22219/kinetik.v4i4.912
[12] S. Shevira, I. Suarjaya, and P. Buana, “Pengaruh kombinasi dan urutan pre-processing pada tweets Bahasa Indonesia,” JITTER J. Ilm. Teknol. dan Komput., vol. 3, no. 2, pp. 78–87, 2022. DOI: https://doi.org/10.24843/JTRTI.2022.v03.i02.p06
[13] C. Sammut and G. I. Webb, Encyclopedia of Machine Learning and Data Mining, 2nd ed. Springer Publishing Company, 2017. DOI: https://doi.org/10.1007/978-1-4899-7687-1
[14] H. Sari, G. L. Ginting, T. Zebua, and Mesran, “Penerapan algoritma term frequency inverse-document frequency untuk text mining,” J-SISKO TECH (Jurnal Teknol. Sist. Inf. dan Sist. Komput. TGD), vol. 4, no. 1, pp. 100–107, 2021.
[15] R. Kohavi and others, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1995, vol. 14, no. 2, pp. 1137–1145.
[16] R. Blagus and L. Lusa, “SMOTE for high-dimensional class-imbalanced data,” BMC Bioinformatics, vol. 14, no. 1, pp. 1–16, 2013. DOI: https://doi.org/10.1186/1471-2105-14-106
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Dipo Yudhis Rana, Nur Fadillah, Asyifa Raudha Syarifah, Dhea Aulia Afiandri, Harsih Rianto, Yamin Nuryamin, Susi Susilowati

This work is licensed under a Creative Commons Attribution 4.0 International License.





