E-mail Spam Filtering using Machine Learning: Comparing Naive Bayes and Logistic Regression Algorithm
Abstract
We compare the performances between two machine learning classifier algorithms for Email spam
filtering. Before the application or machine learning, handcrafted rule-based algorithms were used based on some
“spammy” keywords which were unreliable and underperforming. The Classifier Algorithms that are compared
are Naive Bayes and Logistic Regression. The corpus we used was the Ling-Spam dataset from Kaggle. We
compared the performance of the naive Bayes algorithm to the Logistic Regression classifier and found that
Naive Bayes Algorithm performs better spam classification than Logistic Regression.
References
menggunakan algoritma naïve bayes. Vol. 3, No. 1, Juni 2022, hal. 9-19
[2] Fitri, S.J (2017). Penerapan Support Vector Machine (SVM) untuk pengkategorian
Penelitian,Vol.1 No.1 (2017)19-25.
[3] Nurul Faridhotul Hidayah, Kurnia Paranita Kartika R., Saiful Nur Budiman (2022)
Penerapan Metode Naive Bayes Dalam Analisis Sentimen Aplikasi Sentuh Tanahku
Pada Google Play. Vol. 6 No. 2, September 2022
[4] Muhammad Ichsan Gunawan, Dedy Sugiarto, Is Mardianto(2020) Peningkatan Kinerja
Akurasi Prediksi Penyakit Diabetes Mellitus Menggunakan Metode Grid Search pada
algoritma Logistic Regression. Vol.6 No.3 Desember 2020
[5] P. Domingos and M. Pazzani. On the optimality of the simple Bayesian classifier under
zero-one loss. Machine Learning, 29:103–130, 1997.
[6] Rish, Irina. "An empirical study of the naive Bayes classifier." IJCAI 2001 workshop on
empirical methods in artificial intelligence. Vol. 3. No. 22. 2001.
[7] Dada, Emmanuel Gbenga, et al. "Machine learning for email spam filtering: review,
approaches and open research problems." Heliyon 5.6 (2019): e01802.
[8] Cranor, Lorrie Faith, and Brian A. LaMacchia. "Spam!." Communications of the ACM
41.8 (1998): 74-83.
[9] Kim, Sang-Woon, and Joon-Min Gil. "Research paper classification systems based on
TF-IDF and LDA schemes." Human-centric Computing and Information Sciences 9.1
(2019): 1-21.
All materials contained within this journal are protected by Intellectual Property Corporation of Malaysia, Copyright Act 1987 and may not be reproduced, distributed, transmitted, displayed, published, or
broadcast without the prior, express written permission of Centre for Graduate Studies, Universiti Selangor, Malaysia. You may not alter or remove any copyright or other notice from copies of this content.