Proceedings of International Conference on Applied Innovation in IT
2025/07/26, Volume 13, Issue 3, pp.199-206

Enhancing Machine Learning Model Accuracy with Effective Data Scaling Strategies


Wasan Khairallah Ali


Abstract: The study on the impact of data scaling techniques on machine learning algorithms for predicting heart disease highlights the importance of preprocessing in enhancing model performance. Data scaling is essential when dealing with datasets that have diverse attribute ranges, as it can significantly influence the effectiveness of various machine learning models. In this investigation, eleven widely used algorithms, including K-Nearest Neighbors (KNN) and Logistic Regression, were evaluated using three scaling methods: Min-Max scaling, Z-score standardization, and MaxAbs scaling. The performance was assessed through precision, recall, and F1 score metrics across multiple experiments.The findings indicate that several algorithms performed better with MaxAbs scaling, particularly those sensitive to data distribution, such as KNN and Logistic Regression. This suggests that the choice of scaling technique is crucial for achieving accurate and consistent predictions in machine learning applications related to heart disease. The results emphasize the need for careful selection of scaling methods to optimize the performance of machine learning models in medical diagnostics.

Keywords: Data Scaling, Machine Learning Algorithms, Prediction of Heart Disease, Model Performance Metrics, Preprocessing Strategies.

DOI: Under Indexing

Download: PDF

References:

  1. P. Rani, S. Verma, S. P. Yadav, B. K. Rai, M. S. Naruka, and D. Kumar, "Simulation of the lightweight blockchain technique based on privacy and security for healthcare data for the cloud system," International Journal of E-Health and Medical Communications, vol. 13, no. 4, pp. 1-15, 2022, [Online]. Available: https://doi.org/10.4018/IJEHMC.20221001.oa8.
  2. J. Soni, U. Ansari, D. Sharma, and S. Soni, "Predictive data mining for medical diagnosis: An overview of heart disease prediction," International Journal of Computer Applications, vol. 17, no. 8, pp. 43-48, 2011.
  3. W. P. Lord and D. C. Wiggins, "Medical Decision Support Systems," in Advances in Health Care Technology, G. Spekowius and T. Wendler, Eds., vol. 6. Dordrecht: Kluwer Academic Publishers, 2006, pp. 403-419, doi: 10.1007/1-4020-4384-8_25.
  4. M. M. Ahsan, T. E. Alam, T. Trafalis, and P. Huebner, "Deep MLP-CNN Model Using Mixed-Data to Distinguish between COVID-19 and Non-COVID-19 Patients," Symmetry, vol. 12, no. 9, p. 1526, Sep. 2020, [Online]. Available: https://doi.org/10.3390/sym12091526.
  5. M. M. Ahsan et al., "Detecting SARS-CoV-2 From Chest X-Ray Using Artificial Intelligence," IEEE Access, vol. 9, pp. 35501-35513, 2021, [Online]. Available: https://doi.org/10.1109/ACCESS.2021.3061621.
  6. P. Rani, P. N. Singh, S. Verma, N. Ali, P. K. Shukla, and M. Alhassan, "An implementation of modified blowfish technique with honey bee behavior optimization for load balancing in cloud system environment," Wireless Communications and Mobile Computing, vol. 2022, pp. 1-14, 2022.
  7. M. S. Amin, Y. K. Chiam, and K. D. Varathan, "Identification of significant features and data mining techniques in predicting heart disease," Telematics and Informatics, vol. 36, pp. 82-93, Mar. 2019, [Online]. Available: https://doi.org/10.1016/j.tele.2018.11.007.
  8. M. Shouman, T. Turner, and R. Stocker, "Integrating decision tree and k-means clustering with different initial centroid selection methods in the diagnosis of heart disease patients," in Proceedings of the International Conference on Data Science, 2012, p. 1, [Online]. Available: http://world-comp.org/p2012/DMI9007.pdf.
  9. G. Ansari, P. Rani, and V. Kumar, "A novel technique of mixed gas identification based on the group method of data handling (GMDH) on time-dependent MOX gas sensor data," in Proceedings of International Conference on Recent Trends in Computing, Springer, 2023, pp. 641-654.
  10. A. Singh et al., "Blockchain-Based Lightweight Authentication Protocol for Next-Generation Trustworthy Internet of Vehicles Communication," IEEE Transactions on Consumer Electronics, vol. 70, no. 2, pp. 4898-4907, May 2024, [Online]. Available: https://doi.org/10.1109/TCE.2024.3351221.
  11. S. Verma et al., "An automated face mask detection system using transfer learning based neural network to preventing viral infection," Expert Systems, p. e13507, 2024.
  12. L. Shahriyari, "Effect of normalization methods on the performance of supervised learning algorithms applied to HTSeq-FPKM-UQ data sets: 7SK RNA expression as a predictor of survival in patients with colon adenocarcinoma," Briefings in Bioinformatics, vol. 20, no. 3, pp. 985-994, May 2019, [Online]. Available: https://doi.org/10.1093/bib/bbx153.
  13. A. Ambarwari, Q. Jafar Adrian, and Y. Herdiyeni, "Analysis of the Effect of Data Scaling on the Performance of the Machine Learning Algorithm for Plant Identification," Journal of RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 1, pp. 117-122, Feb. 2020, [Online]. Available: https://doi.org/10.29207/resti.v4i1.1517.
  14. S. G. K. Patro and K. K. Sahu, "Normalization: A Preprocessing Stage," International Advanced Research Journal in Science, Engineering and Technology, pp. 20-22, Mar. 2015, doi: 10.17148/IARJSET.2015.2305.
  15. P. Rani, U. C. Garjola, and H. Abbas, "A Predictive IoT and Cloud Framework for Smart Healthcare Monitoring Using Integrated Deep Learning Model," NJF Intelligent Engineering Journal, vol. 1, no. 1, pp. 53-65, 2024.
  16. J. Han, M. Kamber, and J. Pei, "Data mining: Concepts and Techniques," Morgan Kaufmann Publishers, 2012, [Online]. Available: http://homes.di.unimi.it/ceselli/IM/2012-13/slides/02-KnowYourData.pdf.
  17. M. Buda, A. Maki, and M. A. Mazurowski, "A systematic study of the class imbalance problem in convolutional neural networks," Neural Networks, vol. 106, pp. 249-259, Oct. 2018, [Online]. Available: https://doi.org/10.1016/j.neunet.2018.07.011.
  18. S. P. Yadav et al., "An improved deep learning-based optimal object detection system from images," Multimedia Tools and Applications, vol. 83, no. 10, pp. 30045-30072, 2024.
  19. D. Dua and C. Graff, "UCI machine learning repository," 2017, [Online]. Available: https://archive.ics.uci.edu/ml.
  20. J. Han, J. Pei, and H. Tong, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2022, [Online]. Available: https://books.google.com/books?hl=en&lr=&id=NR1oEAAAQBAJ.
  21. S. Dudoit, Y. H. Yang, M. J. Callow, and T. P. Speed, "Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments," Statistica Sinica, pp. 111-139, 2002.
  22. A. Tharwat, T. Gaber, A. Ibrahim, and A. E. Hassanien, "Linear discriminant analysis: A detailed tutorial," AI Communications, vol. 30, no. 2, pp. 169-190, 2017.
  23. P. Geurts, D. Ernst, and L. Wehenkel, "Extremely randomized trees," Machine Learning, vol. 63, no. 1, pp. 3-42, Apr. 2006, [Online]. Available: https://doi.org/10.1007/s10994-006-6226-1.
  24. D. M. W. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation,” J. Mach. Learn. Technol., vol. 2, pp. 37–63, 2011, doi: 10.9735/2229-3981.


    HOME

       - Conference
       - Journal
       - Paper Submission to Journal
       - Paper Submission to Conference
       - For Authors
       - For Reviewers
       - Important Dates
       - Conference Committee
       - Editorial Board
       - Reviewers
       - Last Proceedings


    PROCEEDINGS

       - Volume 13, Issue 3 (ICAIIT 2025)
       - Volume 13, Issue 2 (ICAIIT 2025)
       - Volume 13, Issue 1 (ICAIIT 2025)
       - Volume 12, Issue 2 (ICAIIT 2024)
       - Volume 12, Issue 1 (ICAIIT 2024)
       - Volume 11, Issue 2 (ICAIIT 2023)
       - Volume 11, Issue 1 (ICAIIT 2023)
       - Volume 10, Issue 1 (ICAIIT 2022)
       - Volume 9, Issue 1 (ICAIIT 2021)
       - Volume 8, Issue 1 (ICAIIT 2020)
       - Volume 7, Issue 1 (ICAIIT 2019)
       - Volume 7, Issue 2 (ICAIIT 2019)
       - Volume 6, Issue 1 (ICAIIT 2018)
       - Volume 5, Issue 1 (ICAIIT 2017)
       - Volume 4, Issue 1 (ICAIIT 2016)
       - Volume 3, Issue 1 (ICAIIT 2015)
       - Volume 2, Issue 1 (ICAIIT 2014)
       - Volume 1, Issue 1 (ICAIIT 2013)


    PAST CONFERENCES

       ICAIIT 2025
         - Photos
         - Reports

       ICAIIT 2024
         - Photos
         - Reports

       ICAIIT 2023
         - Photos
         - Reports

       ICAIIT 2021
         - Photos
         - Reports

       ICAIIT 2020
         - Photos
         - Reports

       ICAIIT 2019
         - Photos
         - Reports

       ICAIIT 2018
         - Photos
         - Reports

    ETHICS IN PUBLICATIONS

    ACCOMODATION

    CONTACT US

 

        

         Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0


                                                   This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License


           ISSN 2199-8876
           Publisher: Edition Hochschule Anhalt
           Location: Anhalt University of Applied Sciences
           Email: leiterin.hsb@hs-anhalt.de
           Phone: +49 (0) 3496 67 5611
           Address: Building 01 - Red Building, Top floor, Room 425, Bernburger Str. 55, D-06366 Köthen, Germany

        site traffic counter

Creative Commons License
Except where otherwise noted, all works and proceedings on this site is licensed under Creative Commons Attribution-ShareAlike 4.0 International License.