E-mail Spam Filtering using Machine Learning: Comparing Naive Bayes and Logistic Regression Algorithm

  • Fachri Ardiansyah Universitas AMIKOM Yogyakarta
  • Takhamo Gori Universitas AMIKOM Yogyakarta
  • Zulkipli - Universitas AMIKOM Yogyakarta
  • Ni’matur Rohim Universitas AMIKOM Yogyakarta
  • Muhammad Iqbal Universitas AMIKOM Yogyakarta
Keywords: Email Spamming, Logistic Regression, Naive Bayes, Text Mining

Abstract

We compare the performances between two machine learning classifier algorithms for Email spam
filtering. Before the application or machine learning, handcrafted rule-based algorithms were used based on some
“spammy” keywords which were unreliable and underperforming. The Classifier Algorithms that are compared
are Naive Bayes and Logistic Regression. The corpus we used was the Ling-Spam dataset from Kaggle. We
compared the performance of the naive Bayes algorithm to the Logistic Regression classifier and found that
Naive Bayes Algorithm performs better spam classification than Logistic Regression.

References

[1] Januar Al Amien, Harun Mukhtar ,M. Arif Rucyat (2020). Filtering spam email
menggunakan algoritma naïve bayes. Vol. 3, No. 1, Juni 2022, hal. 9-19
[2] Fitri, S.J (2017). Penerapan Support Vector Machine (SVM) untuk pengkategorian
Penelitian,Vol.1 No.1 (2017)19-25.
[3] Nurul Faridhotul Hidayah, Kurnia Paranita Kartika R., Saiful Nur Budiman (2022)
Penerapan Metode Naive Bayes Dalam Analisis Sentimen Aplikasi Sentuh Tanahku
Pada Google Play. Vol. 6 No. 2, September 2022
[4] Muhammad Ichsan Gunawan, Dedy Sugiarto, Is Mardianto(2020) Peningkatan Kinerja
Akurasi Prediksi Penyakit Diabetes Mellitus Menggunakan Metode Grid Search pada
algoritma Logistic Regression. Vol.6 No.3 Desember 2020
[5] P. Domingos and M. Pazzani. On the optimality of the simple Bayesian classifier under
zero-one loss. Machine Learning, 29:103–130, 1997.
[6] Rish, Irina. "An empirical study of the naive Bayes classifier." IJCAI 2001 workshop on
empirical methods in artificial intelligence. Vol. 3. No. 22. 2001.
[7] Dada, Emmanuel Gbenga, et al. "Machine learning for email spam filtering: review,
approaches and open research problems." Heliyon 5.6 (2019): e01802.
[8] Cranor, Lorrie Faith, and Brian A. LaMacchia. "Spam!." Communications of the ACM
41.8 (1998): 74-83.
[9] Kim, Sang-Woon, and Joon-Min Gil. "Research paper classification systems based on
TF-IDF and LDA schemes." Human-centric Computing and Information Sciences 9.1
(2019): 1-21.
Published
2023-12-31
How to Cite
Ardiansyah, F., Gori, T., -, Z., Rohim, N., & Iqbal, M. (2023). E-mail Spam Filtering using Machine Learning: Comparing Naive Bayes and Logistic Regression Algorithm. Selangor Science & Technology Review (SeSTeR), 7(2), PREPRINT. Retrieved from https://sester.journals.unisel.edu.my/ojs/index.php/sester/article/view/298