This paper proposes a comparative spam email detection framework that integrates traditional machine learning and deep learning models with a blockchain-supported auditability layer to enhance digital security, transparency, and trustworthiness. The study evaluates six representative classification approaches, including Naive Bayes, Support Vector Machine, Random Forest, Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and Bidirectional Encoder Representations from Transformers (BERT). A unified experimental workflow is implemented consisting of data preprocessing, feature extraction, model training, and comparative evaluation using Accuracy, Precision, Recall, F1-score, and ROC-AUC metrics. In addition to classification performance analysis, the framework incorporates a lightweight blockchain-based verification mechanism that records classification metadata and cryptographic hashes to ensure integrity and traceability of spam filtering results. Experimental findings demonstrate that deep learning models, particularly BERT and LSTM, achieve superior contextual understanding and higher detection accuracy, while traditional machine learning methods provide lower computational complexity and faster execution suitable for lightweight environments. The proposed framework contributes a reproducible benchmarking methodology for intelligent spam detection and demonstrates how blockchain-supported auditability can improve transparency and reliability in AI-driven cybersecurity systems.
Kaspersky Lab, “Spam and Phishing in Q4 2024,” Technical Report, 2024.
I. Androutsopoulos, J. Koutsias, K. V. Chandrinos and C. D. Spyropoulos, “An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering,” in Proceedings of ECML, 2000, pp. 9-17.
T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” in Proceedings of ECML, 1998, pp. 137-142.
J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of NAACL-HLT, 2019, pp. 4171-4186.
Y. Kim, “Convolutional Neural Networks for Sentence Classification,” in Proceedings of EMNLP, 2014, pp. 1746-1751.
S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
H. Nguyen and A. Le, “RDRSegmenter: A Robust Vietnamese Word Segmentation Algorithm,” in Proceedings of PACLIC, 2016, pp. 265-272.
D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in Proceedings of ICLR, 2015.
D. Powers, “Evaluation: From Precision, Recall and
F-Measure to ROC, Informedness, Markedness and Correlation,” Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37-63, 2011, [Online]. Available: https://doi.org/10.48550/arXiv.2010.16061.
R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” in Proceedings of IJCAI, 1995, pp. 1137-1145.
S. Raschka and V. Mirjalili, Python Machine Learning, 3rd ed. Birmingham, U.K.: Packt, 2019.