E-mail Spam Filtering using Machine Learning: Comparing Naive Bayes and Logistic Regression Algorithm

  • Fachri Ardiansyah Universitas AMIKOM Yogyakarta
  • Takhamo Gori Universitas AMIKOM Yogyakarta
  • Zulkipli - Universitas AMIKOM Yogyakarta
  • Ni’matur Rohim Universitas AMIKOM Yogyakarta
  • Muhammad Iqbal Universitas AMIKOM Yogyakarta
Keywords: Email Spamming, Logistic Regression, Naive Bayes, Text Mining


We compare the performances between two machine learning classifier algorithms for Email spam
filtering. Before the application or machine learning, handcrafted rule-based algorithms were used based on some
“spammy” keywords which were unreliable and underperforming. The Classifier Algorithms that are compared
are Naive Bayes and Logistic Regression. The corpus we used was the Ling-Spam dataset from Kaggle. We
compared the performance of the naive Bayes algorithm to the Logistic Regression classifier and found that
Naive Bayes Algorithm performs better spam classification than Logistic Regression.


