PERBANDINGAN METODE CART DAN NAÏVE BAYES UNTUK KLASIFIKASI CUSTOMER CHURN
DOI:
https://doi.org/10.31949/infotech.v9i2.5641Abstract
Classification is the process of identifying and grouping an object into the same group or category Classification can be used to group a large-sized dataset, and some commonly used classification methods are CART (Classification And Regression Tree) and Naïve Bayes. This study discusses the comparison of CART and Naïve Bayes methods by measuring accuracy, precision, recall, and f1-score values with 3 scenarios of training and testing dataset distribution. Accuracy, precision, recall, and f1-score measurements are performed using a confusion matrix. The scenarios for training and testing dataset division are 70%, 80%, and 90% of the training dataset. From the results of the study, CART has the highest average accuracy and f1-score of 79.616% and 57.636% respectively, while the highest average accuracy and f1-score of Naïve Bayes are 75.104% and 62.004% respectively.
Keywords:
Classification, CART, Naive Bayes, Confusion MatrixDownloads
References
Ahn, J., Hwang, J., Kim, D., Choi, H., & Kang, S. (2020). A Survey on Churn Analysis in Various Business Domains. IEEE Access, 8, 220816–220839. https://doi.org/10.1109/ACCESS.2020.3042657
Al-Harbi, O. (2019). A Comparative Study of Feature Selection Methods for Dialectal Arabic Sentiment Classification Using Support Vector Machine. International Journal of Computer Science and Network Security, 19(1), 167–176. https://doi.org/10.48550/arXiv.1902.06242
Alverina, D., Chrismanto, A. R., & Santosa, R. G. (2018). Perbandingan Algoritma C4.5 dan CART dalam Memprediksi Kategori Indeks Prestasi Mahasiswa. Jurnal Teknologi Dan Sistem Komputer, 6(2), 76–83. https://doi.org/10.14710/jtsiskom.6.2.2018.76-83
Arora, A., Gupta, B., Uttarakhand, P., & Rawat, I. A. (2017). Analysis of Various Decision Tree Algorithms for Classification in Data Mining. International Journal of Computer Applications, 163(8), 15–19.
Bagul, N., Berad, P., Surana, P., & Khachane, C. (2021). Retail Customer Churn Analysis using RFM Model and K-Means Clustering. International Journal of Engineering Research & Technology, 10(03), 349–354. https://doi.org/DOI : 10.17577/IJERTV10IS030170
Bolón-Canedo, V., & Alonso-Betanzos, A. (2019). Ensembles for feature selection: A review and future trends. Information Fusion, 52(1), 1–12. https://doi.org/10.1016/j.inffus.2018.11.008
Elgeldawi, E., Sayed, A., Galal, A. R., & Zaki, A. M. (2021). Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis. Informatics, 8(4), 1–21. https://doi.org/10.3390/informatics8040079
Ghasemi, F., Neysiani, B. S., & Nematbakhsh, N. (2020). Feature selection in pre-diagnosis heart coronary artery disease detection. 6th International Conference on Web Research (ICWR), 6, 27–32. https://doi.org/10.1109/ICWR49608.2020.9122285
Hadyan Tisantri, D., Cahya Wihandika, R., & Adinugroho, S. (2019). Prediksi Keputusan Pelanggan Menggunakan Extreme Learning Machine Pada Data Telco Customer Churn. Jurnal Pengembangan Teknologi Informasi Dan Ilmu KomputerJurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 3(11), 10516–10523.
Halibas, A. S., Cherian Matthew, A., Pillai, I. G., Harold Reazol, J., Delvo, E. G., & Bonachita Reazol, L. (2019). Determining the intervening effects of exploratory data analysis and feature engineering in telecoms customer churn modelling. 2019 4th MEC International Conference on Big Data and Smart City, 1–7. https://doi.org/10.1109/ICBDSC.2019.8645578
Hanifa, T. T., Adiwijaya, & Al-faraby, S. (2017). Analisis Churn Prediction pada Data Pelanggan PT. Telekomunikasi dengan Logistic Regression dan Underbagging. E-Proceeding of Engineering, 4(2), 78.
Hary Candana, E. W., Gede, I., Gunadi, A., & Divayana, D. G. H. (2021). Perbandingan Fuzzy Tsukamoto, Mamdini Dan Sugeno Dalam Penentuan Hari Baik Pernikahan Berdasarkan Wariga Menggunakan Confusion Matrix. Jurnal Ilmu Komputer Indonesia, 6(2), 14–22.
Hasibuan, M. R., & Marji. (2019). Pemilihan Fitur dengan Information Gain untuk Klasifikasi Penyakit Gagal Ginjal menggunakan Metode Modified K-Nearest Neighbor (MKNN). Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 3(11), 10435–10443. http://j-ptiik.ub.ac.id
Hasnain, M., Pasha, M. F., Ghani, I., Imran, M., Alzahrani, M. Y., & Budiarto, R. (2020). Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking. IEEE Access, 8, 90847–90861. https://doi.org/10.1109/ACCESS.2020.2994222
Insan, N., Hadijati, M., & Irwansyah, I. (2020). Perbandingan Metode Classification and Regression Trees (CART) dengan Naïve Bayes Classification (NBC) dalam Klasifikasi Status Gizi Balita di Kelurahan Pagesangan Barat. Eigen Mathematics Journal, 3(1), 14. https://doi.org/10.29303/emj.v1i2.68
Irmanda, H. N., Astriratma, R., & Afrizal, S. (2019). Perbandingan Metode Jaringan Syaraf Tiruan Dan Pohon Keputusan Untuk Prediksi Churn. JSI: Jurnal Sistem Informasi (E-Journal), 11(2), 1817–1825. https://doi.org/10.36706/jsi.v11i2.9286
Jones, A. H. S., & Makmun, M. S. (2021). Implementasi Metode CART untuk Klasifikasi Diagnosis Penyakit Hepatitis Pada Anak. Journal of Informatics, Information System, Software Engineering and Applications, 3(2), 61–70. https://doi.org/10.20895/INISTA.V3I2
Kaharudin, Pradana, M. G., & Kusrini. (2019). Prediksi Customer Churn Perusahaan Telekomunikasi Menggunakan Naïve Bayes Dan K-Nearest Neighbor. Jurnal Informasi Interaktif, 4(3), 165–171.
Mantovani, R. G., Horváth, T., Cerri, R., Junior, S. B., Vanschoren, J., & de Carvalho, A. C. P. de L. F. (2018). An empirical study on hyperparameter tuning of decision trees. https://doi.org/https://doi.org/10.48550/arXiv.1812.02207
Nalatissifa, H., & Pardede, H. F. (2021). Customer Decision Prediction Using Deep Neural Network on Telco Customer Churn Data. Jurnal Elektronika Dan Telekomunikasi, 21(2), 122–127. https://doi.org/10.14203/jet.v21.122-127
Nguyen, T. H., & Zucker, J. D. (2019). Enhancing metagenome-based disease prediction by unsupervised binning approaches. Proceedings of 2019 11th International Conference on Knowledge and Systems Engineering, KSE 2019, 1–5. https://doi.org/10.1109/KSE.2019.8919295
Nikmatun, I. A., & Waspada, I. (2019). Implementasi Data Mining untuk Klasifikasi Masa Studi Mahasiswa Menggunakan Algoritma K-Nearest Neighbor. Jurnal SIMETRIS, 10(2), 421–432.
Novendri, R., & Andreswari, R. (2021). Implementasi Data Mining Untuk Memprediksi Customer Churn Menggunakan Algoritma Naive Bayes. E-Proceeding of Engineering, 8(2), 2762–2773.
Oseki, Y., Yang, C., & Marantz, A. (2019). Modeling Hierarchical Syntactic Structures in Morphological Processing. Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, 43–52. https://doi.org/10.18653/v1/w19-2905
Prabawati, N. I., Widodo, & Duskarnaen, M. F. (2019). Kinerja Algoritma Classification a nd Regression Tree ( Cart ) da lam Mengklasifikasikan Lama Masa Studi Mahasiswa y ang Mengikuti Organisasi d i Universitas Negeri Jakarta Avalaiable at : Avalaiable at : Jurnal Pinter, 3(2), 139–145.
Pradana, E. (2018). Analisis Penerapan Adaptive Boosting ( Adaboost ) Dalam Meningkatkan Performasi Algoritma C4.5. Jurnal Teknologi Pelita Bangsa, 96.
Praningki, T., & Budi, I. (2018). Sistem Prediksi Penyakit Kanker Serviks Menggunakan CART, Naive Bayes, dan k-NN. Creative Information Technology Journal, 4(2), 83. https://doi.org/10.24076/citec.2017v4i2.100
Prasetiyowati, M. I., Maulidevi, N. U., & Surendro, K. (2021). Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest. Journal of Big Data, 8(1), 22. https://doi.org/10.1186/s40537-021-00472-4
Riyanto, E. A., Juninisvianty, T., Nasution, D. F., & Risnandar, R. (2021). Analisis Kinerja Algoritma CART dan Naive Bayes Berbasis Particle Swarm Optimization (PSO) untuk Klasifikasi Kelayakan Kredit Koperasi. Jurnal Teknologi Informasi Dan Ilmu Komputer, 8(1), 55. https://doi.org/10.25126/jtiik.0812988
Santra, A. K., & Christy, C. J. (2012). Genetic Algorithm and Confusion Matrix for Document Clustering. International Journal of Computer Science, 3(2), 322–328. http://ijcsi.org/papers/IJCSI-9-1-2-322-328.pdf
Setyaningsih, E. R., & Listiowarni, I. (2021). Categorization of Exam Questions based on Bloom Taxonomy using Naïve Bayes and Laplace Smoothing. 3rd 2021 East Indonesia Conference on Computer and Information Technology, EIConCIT 2021, 330–333. https://doi.org/10.1109/EIConCIT50028.2021.9431862
Sjarif, N. N. A., Yusof, M. R. M., Wong, D. H. Ten, Ya’akob, S., Ibrahim, R., & Osman, M. Z. (2019). A Customer Churn Prediction using Pearson Correlation Function and K Nearest Neighbor Algorithm for Telecommunication Industry. International Journal of Advances in Soft Computing and Its Applications, 11(2), 46–59.
Subarkah, P., Santiko, I., & Tri, A. (2017). Perbandingan Kinerja Algoritma Cart dan Naive Bayesian untuk Mendiagnosa Penyakit Diabetes Melitus. Conference on Information Technology, Information System and Electrical Engineering, 17.
Tangirala, S. (2020). Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm. International Journal of Advanced Computer Science and Applications, 11(2), 612–619. https://doi.org/10.14569/ijacsa.2020.0110277
Utami, Y. T., Shofiana, D. A., & Heningtyas, Y. (2020). Penerapan Algoritma C4.5 Untuk Prediksi Churn Rate Pengguna Jasa Telekomunikasi. Jurnal Komputasi, 8(2), 69–76. https://doi.org/10.23960/komputasi.v8i2.2647
Vatanen, T., Väyrynen, J. J., & Virpioja, S. (2010). Language identification of short text segments with n-gram models. Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010, 3423–3430.
Widaningsih, S. (2019). Perbandingan Metode Data Mining Untuk Prediksi Nilai dan Waktu Kelulusan Mahasiswa Prodi Teknik Informatika Dengan Algoritma C4.5, Naïve Bayes, KNN, dan SVM. Jurnal Tekno Insentif, 13(1), 16–25. https://doi.org/10.36787/jti.v13i1.78
Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295–316. https://doi.org/10.1016/j.neucom.2020.07.061
Yu, T., & Zhu, H. (2020). Hyper-Parameter Optimization: A Review of Algorithms and Applications. 1–56. https://doi.org/https://doi.org/10.48550/arXiv.2003.05689
Yulianti, Y., & Saifudin, A. (2020). Sequential Feature Selection in Customer Churn Prediction Based on Naive Bayes. IOP Conference Series: Materials Science and Engineering, 879(1), 7. https://doi.org/10.1088/1757-899X/879/1/012090
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Rahmat Ryan Adhitya, Wina Witanti, Rezki Yuniarti

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.