| Peer-Reviewed

Evaluation of Machine Learning Techniques Towards Early Detection of Cardiovascular Diseases

Received: 20 January 2023    Accepted: 13 February 2023    Published: 15 April 2023
Views:       Downloads:
Abstract

The effectiveness of three Machine Learning (ML) algorithms: Support Vector Machine (SVM), Random Forest (RF) and K-Nearest Neighbour (KNN) techniques for the early diagnosis of heart diseases were evaluated. Heart disease’ dataset collected from kaggle.com data repository, which comprised of 303 data points with 13 features and a target variable were used and data preprocessing by data shuffling and dimension reduction were performed. The new dimension of the dataset was chosen such that 85.03% of the original information is retained. The preprocessed dataset was partitioned into 70% of the training set and 30% of the testing set. The ML algorithms were trained and tested for the diagnosis of cardiovascular diseases (CVD). The training performances of these models were evaluated with a k-fold cross-validation algorithm using 10 folds. The k-fold accuracy shows KNN with an accuracy of 0.837662, RF with an accuracy of 0.834091, and SVM with an accuracy of 0.814935. The test results also show KNN with an accuracy of 0.8, SVM with an accuracy of 0.7889, and RF with an accuracy of 0.7667. KNN emerged the best model both in training and test’s performances and is recommended for the early diagnosis of CVD.

Published in American Journal of Artificial Intelligence (Volume 7, Issue 1)
DOI 10.11648/j.ajai.20230701.12
Page(s) 6-16
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Cardiovascular Disease, Prediction, K-nearest Neighbor, Machine Learning, Random Forest, Support Vector Machine, Machine Learning, Classification, Diagnoses

References
[1] Cardiovascular Disease. Accessed: February 25, 2021 [online]. Available: https://www.who.int/health topics/cardiovascular-diseases
[2] Global Atlas on Cardiovascular Disease Prevention and Control, WHO, World Heart Federation, and World Stroke Organization, 2011.
[3] C. Krittanawong, H. U. H. Virk, S. Bangalore, et al. “Machine learning prediction in cardiovascular diseases: a meta-analysis.” Sci Rep 10, 16057 (2020). https://doi.org/10.1038/s41598-020-72685- 1.
[4] K. Anderson, P. Odell, P. Wilson, W. Kannel, “Cardiovascular Disease risk Profiles”. American Heart Journal, vol. 121, no.1, pp293 – 298, 1991.
[5] P. Lapuerta, S. Azen, L. Labree, “Use of Neural Networks in Predicting the risk of Coronary Artery Disease”. Computers and Biomedical Research, vol. 28, no.1, pp38 – 52, 1995.
[6] H. Asri, H. Mousannif, H. Moatassime, T. Noel, “Using Machine Learning Algorithms for Breast Cancer risk Prediction and Diagnosis”. Procedia Computer Science, Vol. 83, pp1064 – 1069, 2016.
[7] F. Koike and N. Morimoto, “Supervised Forecasting of the Range Expansion of Novel Nonindigenous Organisms: Alien Pest Organisms and the 2009 H1N1 flu pandemic”. Global Ecology and Biogeography, vol. 27, 991-1000, 2018. doi: 10.1111/geb.12754.
[8] S. Anno, T. Hara, H. Kai, M. A. Lee, M. A, Y. Chang, K. Oyoshi, Y. Mizukami, T. Tadono, “Spatiotemporal Dengue Fever Hotspots Associated with Climatic factors in Taiwan including Outbreak Predictions Based on Machine-Learning”. Geospatial Health, vol. 14, pp183-194, 2019. doi: 10.4081/gh.2019.771.
[9] L. Tapak, O. Hamidi, M. Fathian, M. Karami, “Comparative evaluation of time series models for predicting influenza outbreaks: Application of influenza-like illness data from sentinel sites of # healthcare centers in Iran”. BMC Research Notes, vol. 12, 2019. doi: 10.1186/s13104-019-4393-y.
[10] M. Alsharqi, W. J. Woodward, J. A. Mumith, D. C. Markham, R. Upton, and P. Leeson, “Artificial intelligence and echocardiography,” Echo Research and Practice, vol. 5, pp. R115–R125, 2018.
[11] C. P. Friedman, A. K Wong, D. Blumenthal “Achieving a Nationwide Learning Health System.” SciTransl Med. Vol. 2, no.57, pp57–29, 2010.
[12] Amin UlHaq, Jian Ping Li, Muhammad Hammad emon, Shah Nazir, Ruinan Sun, "A Hybrid Intelligent System Framework for the Prediction of Heart Disease Using Machine Learning Algorithms", Mobile Information Systems, vol. 2018, Article ID 3860146, vol. 21 pages, 2018. https://doi.org/10.1155/2018/3860146.
[13] F. S. Alotaibi, “Implementation of Machine Learning Model to Predict Heart Failure Disease” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 10 no. 6, 2019. http://dx.doi.org/10.14569/IJACSA.2019.0100637.
[14] M. A. Hassani, R. Tao, M. Kamyab, M. H. Mohammadi. “An Approach of Predicting Heart Disease Using a Hybrid Neural Network and Decision Tree.” In Proceedings of the 2020 5th International Conference on Big Data and Computing (ICBDC 2020). Association for Computing Machinery, New York, NY, USA, pp84–89, 2020. DOI: https://doi.org/10.1145/3404687.3404704.
[15] K. Vanisree and J. Singaraju, “Decision support system for congenital heart disease diagnosis based on signs and symptoms using neural networks,” International Journal of Computer Applications, vol. 19, no. 6, pp6–12, 2011.
[16] A. C. Dimopoulos, M. Nikolaidou, F. F. Caballero, et. al, “Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk.” BMC Med Res Methodol 18, 179 (2018). https://doi.org/10.1186/s12874-018-0644-1.
[17] T. J. W. Dawes, A. de Marvao, W. Shi et al., “Machine learning of three-dimensional right ventricular motion enables outcome prediction in pulmonary hypertension: a cardiac MR imaging study,” Radiology, vol. 283, no. 2, pp381–390, 2017.
[18] David W. Aha “Heart Disease UCI, Version 1”, 2019. Retrieved February 12, 2021 from https://www.kaggle.com/ronitf/heart-disease-uci
[19] I. T. Jolliffe, “Principal Component Analysis.” Springer Series in Statistics. New York: Springer-Verlag. 2002. doi: 10.1007/b98835. ISBN 978-0-387-95442-4.
[20] Cortes, Corinna; Vapnik, N. Vladimir, "Support-vector networks". Machine Learning. 20 (3): 273–297, 1995. CiteSeerX 10.1.1.15.9362. doi: 10.1007/BF00994018. S2CID 206787478.
[21] M. Adankon, M. Cheriet, “Support Vector machine.” In: Li S. Z., Jain A. (eds) Encyclopedia of Biometrics. Springer, Boston, MA. (2009). https://doi.org/10.1007/978-0-387-73003-5_299
[22] S. Chopra, “An Introduction to Building a classification model using Random Forest in python. http://www.datascience.com/blog/classification-random-forest-in-python. Retrieved: 26 July, 2019.
[23] Mohammed Zakariah “Classification of large datasets using Random Forest Algorithm in various applications: Survey” International journal of Engineering and Innovative Technology (IJEIT) Volume 4, Issue 3, September 2014.
[24] Fix, Evelyn; Hodges, Joseph L. “Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties” (Report). 1951. USAF School of Aviation Medicine, Randolph Field, Texas.
[25] Altman, Naomi S. (1992). "An introduction to kernel and nearest-neighbor nonparametric regression". The American Statistician. Vol 46, no. 3, pp 175–185. 199 doi: 10.1080/00031305.1992.10475879.
[26] J. Brownlee, “K-Fold cross validation technique,” http://machinelearningmastery.com/k-fold-cross validation 2019. Retrieved: 27 July, 2019.
[27] McHugh, Mary L. “Interrater reliability: The Kappa statistic”. Biochemia Medica. Vol. 22, no. 3, pp 276-282, 2021. Supplementary material.
[28] I. J. Umoren, U. E. Etuk, A. P. Ekong, K. C. Udonyah (2021), Healthcare Logistics Optimization Framework for Efficient Supply Chain Management in Niger Delta Region of Nigeria. International Journal of Advanced Computer Science and Applications, vol 12, no. 4, pp 594-604.
[29] A. Ekong, H. Odikwa, O. Ekong (2021), Minimizing Symptom-based Diagnostic Errors Using Weighted Input Variables and Fuzzy Logic Rules in Clinical Decision Support Systems. International Journal of Advanced Trends in Computer Science and Engineering, vol. 10, no. 3, pp 1567-1575 https://doi.org/10.30534/ijatcse/2021/121032021
Cite This Article
  • APA Style

    Anietie Ekong. (2023). Evaluation of Machine Learning Techniques Towards Early Detection of Cardiovascular Diseases. American Journal of Artificial Intelligence, 7(1), 6-16. https://doi.org/10.11648/j.ajai.20230701.12

    Copy | Download

    ACS Style

    Anietie Ekong. Evaluation of Machine Learning Techniques Towards Early Detection of Cardiovascular Diseases. Am. J. Artif. Intell. 2023, 7(1), 6-16. doi: 10.11648/j.ajai.20230701.12

    Copy | Download

    AMA Style

    Anietie Ekong. Evaluation of Machine Learning Techniques Towards Early Detection of Cardiovascular Diseases. Am J Artif Intell. 2023;7(1):6-16. doi: 10.11648/j.ajai.20230701.12

    Copy | Download

  • @article{10.11648/j.ajai.20230701.12,
      author = {Anietie Ekong},
      title = {Evaluation of Machine Learning Techniques Towards Early Detection of Cardiovascular Diseases},
      journal = {American Journal of Artificial Intelligence},
      volume = {7},
      number = {1},
      pages = {6-16},
      doi = {10.11648/j.ajai.20230701.12},
      url = {https://doi.org/10.11648/j.ajai.20230701.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajai.20230701.12},
      abstract = {The effectiveness of three Machine Learning (ML) algorithms: Support Vector Machine (SVM), Random Forest (RF) and K-Nearest Neighbour (KNN) techniques for the early diagnosis of heart diseases were evaluated. Heart disease’ dataset collected from kaggle.com data repository, which comprised of 303 data points with 13 features and a target variable were used and data preprocessing by data shuffling and dimension reduction were performed. The new dimension of the dataset was chosen such that 85.03% of the original information is retained. The preprocessed dataset was partitioned into 70% of the training set and 30% of the testing set. The ML algorithms were trained and tested for the diagnosis of cardiovascular diseases (CVD). The training performances of these models were evaluated with a k-fold cross-validation algorithm using 10 folds. The k-fold accuracy shows KNN with an accuracy of 0.837662, RF with an accuracy of 0.834091, and SVM with an accuracy of 0.814935. The test results also show KNN with an accuracy of 0.8, SVM with an accuracy of 0.7889, and RF with an accuracy of 0.7667. KNN emerged the best model both in training and test’s performances and is recommended for the early diagnosis of CVD.},
     year = {2023}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Evaluation of Machine Learning Techniques Towards Early Detection of Cardiovascular Diseases
    AU  - Anietie Ekong
    Y1  - 2023/04/15
    PY  - 2023
    N1  - https://doi.org/10.11648/j.ajai.20230701.12
    DO  - 10.11648/j.ajai.20230701.12
    T2  - American Journal of Artificial Intelligence
    JF  - American Journal of Artificial Intelligence
    JO  - American Journal of Artificial Intelligence
    SP  - 6
    EP  - 16
    PB  - Science Publishing Group
    SN  - 2639-9733
    UR  - https://doi.org/10.11648/j.ajai.20230701.12
    AB  - The effectiveness of three Machine Learning (ML) algorithms: Support Vector Machine (SVM), Random Forest (RF) and K-Nearest Neighbour (KNN) techniques for the early diagnosis of heart diseases were evaluated. Heart disease’ dataset collected from kaggle.com data repository, which comprised of 303 data points with 13 features and a target variable were used and data preprocessing by data shuffling and dimension reduction were performed. The new dimension of the dataset was chosen such that 85.03% of the original information is retained. The preprocessed dataset was partitioned into 70% of the training set and 30% of the testing set. The ML algorithms were trained and tested for the diagnosis of cardiovascular diseases (CVD). The training performances of these models were evaluated with a k-fold cross-validation algorithm using 10 folds. The k-fold accuracy shows KNN with an accuracy of 0.837662, RF with an accuracy of 0.834091, and SVM with an accuracy of 0.814935. The test results also show KNN with an accuracy of 0.8, SVM with an accuracy of 0.7889, and RF with an accuracy of 0.7667. KNN emerged the best model both in training and test’s performances and is recommended for the early diagnosis of CVD.
    VL  - 7
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Department of Computer Science, Akwa Ibom State University, Ikot Akpaden, Nigeria

  • Sections