Penerapan Teknik Ensemble Learning untuk Deteksi Dini Penyakit Jantung Menggunakan Metode Voting Classifier dan Stacking Classifier

Mutiara Romana Kusuma
Universitas Nusa Megarkencana
Antonius Angga Kurniawan
Universitas Gunadarma
Indonesia

Abstract

Heart disease can be detected early by identifying risk factors that may contribute to its development. The Farmingham Study has conducted research on these risk factors. Machine learning models can be applied to perform early detection automatically based on data from the study. The obtained data is then processed through several pre-processing stages to prepare it for use in the modeling process. Afterward, models are built using the Random Forest, Logistic Regression, and K-Nearest Neighbor algorithms. Models built with individual algorithms show quite good performances, with the highest accuracy value of 0.91 for the Random Forest algorithm and the lowest accuracy of 0.67 for the Logistic Regression algorithm. Ensemble learning techniques such as the Voting Classifier and Stacking Classifier techniques are applied in this study to improve accuracy. The stacking technique successfully increased accuracy to 0.92. However, the voting technique does not outperform the Random Forest model. This is because the voting technique is more suitable for combining algorithms with balanced performance, whereas in this study, the Random Forest and Logistic Regression models have a significant difference in performance.

Keywords
Ensemble Learning, Heart Disease, Stacking Classifier, Voting Classifier
References

“Cardiovascular diseases (CVDs),” World Health Organization. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds). [Accessed: May 27, 2025].

R. Fadil, “Gambaran profil LIPID pada pasien penderita jantung koroner di RSPAD Gatot Soebroto,” M.S. thesis, Universitas MH Thamrin, 2024.

A. Ridwanmo, M. Fadillah, and T. H. Irfani, “Deteksi dini faktor risiko penyakit jantung dan pembuluh darah, hubungan antara obesitas, aktivitas fisik dan kolesterol total di Kecamatan Kertapati, Kota Palembang,” Jurnal Epidemiologi Kesehatan Komunitas, vol. 5, no.2, pp. 96–103, Aug. 2020, doi: 10.14710/jekk.v5i2.6729.

C. Andersson, M. Nayor, C. W. Tsao, D. Levy, and R. S. Vasan, “Framingham Heart Study: JACC focus seminar, 1/8,” Journal of the American College of Cardiology, vol. 77, no. 3, pp. 293–306, Jun. 2021, doi: 10.1016/j.jacc.2021.01.059.

A. A. MB and K. WA, “Cardiovascular diseases risk prediction using the Framingham Risk Score,” Egypt J Occup Med, vol. 45, pp. 249–264, Sep. 2021, doi: 10.21608/ejom.2021.193283.

A. Kusnandang, “Framingham score dan Jakarta cardivascular score untuk menentukan kejadian cardiovaskuler event pekerja Rumah Sakit Pertamina Cirebon,” Tunas Medika Jurnal Kedokteran dan Kesehatan, vol. 5, no. 2, pp. 1–5, Oct. 2019. [Online]. Available: https://jurnal.ugj.ac.id/index.php/tumed/article/view/2714. [Accessed: May 28, 2025].

Anharudin and M. Tejamaya, “Perbandingan risiko kardiovaskuler menggunakan metode Framingham, WHO chart dan ASCVD pada pekerja PT. X tahun 2021,” Promotif : Jurnal Kesehatan Masyarakat, vol. 12, pp. 77–84, Jun. 2022, doi: 10.56338/pjkm.v12i1.2465.

A. B. Wibisono and A. Fahrurozi, “Perbandingan algoritma klasifikasi dalam pengklasifikasian data penyakit jantung koroner,” Jurnal Ilmiah Teknologi dan Rekayasa, vol. 24, pp. 161–170, 2019, doi: 10.35760/tr.2019.v24i3.2393.

C. A. ul Hassan et al., “Effectively predicting the presence of coronary heart disease using machine learning classifiers,” Sensors, vol. 22, no. 19, p. 7227, Oct. 2022, doi: 10.3390/s22197227.

S. H. Nurrohman and D. Kurniawan, “Deteksi dini risiko penyakit jantung koroner menggunakan algoritma decision tree dan random forest,” Building of Informatics, Technology and Science (BITS), vol. 6, no. 4, pp. 2582–2593, Mar. 2025, doi: 10.47065/bits.v6i4.7029.

D. Krishnani, A. Kumari, A. Dewangan, A. Singh, and N. S. Naik, “Prediction of coronary heart disease using supervised machine learning algorithms,” in Proc. IEEE Region 10 Annual Int. Conf. (TENCON), Oct. 2019, pp. 367–372. doi: 10.1109/TENCON.2019.8929434.

R. J. Suhatril, R. D. Syah, M. Hermita, B. Gunawan, and W. Silfianti, “Evaluation of machine learning models for predicting cardiovascular disease based on Framingham Heart Study data,” ILKOM Jurnal Ilmiah, vol. 16, no. 1, pp. 68–75, Apr. 2024, doi: 10.33096/ilkom.v16i1.1952.68-75.

D. Yewale, S. Patil, A. R. Date, and A. Nanthaamornphong, “Heart disease prediction using ensemble methods, genetic algorithms, and data augmentation: A preliminary study,” Journal of Robotics and Control (JRC), vol. 6, no. 3, pp. 1092–1105, 2025, doi: 10.18196/jrc.v6i3.25144.

V. Shorewala, “Early detection of coronary heart disease using ensemble techniques,” Inform Med Unlocked, vol. 26, p. 100655, Jan. 2021, doi: 10.1016/j.imu.2021.100655.

B. Ashish, ‘Framingham heart study dataset’. [Online]. Available: https://www.kaggle.com/datasets/aasheesh200/framingham-heart-study-dataset. [Accessed: July 14, 2025].

F. Alghifari and D. Juardi, “Penerapan data mining pada penjualan makanan dan minuman menggunakan metode algoritma naïve Bayes,” Jurnal Ilmiah Informatika (JIF), vol. 9, no. 2, pp. 75–81, Sep. 2021, doi: 10.33884/jif.v9i02.3755.

N. P. A. Widiari, I. M. A. D. Suarjaya, and D. P. Githa, “Teknik data cleaning menggunakan snowflake untuk studi kasus objek pariwisata di Bali,” Jurnal Ilmiah Merpati (Menara Penelitian Akademika Teknologi Informasi), vol. 8, no. 2, pp. 137–145, Aug. 2020, doi: 10.24843/jim.2020.v08.i02.p07.

M. R. A. Prasetya, A. M. Priyatno, and Nurhaeni, “Penanganan imputasi missing values pada data time series dengan menggunakan metode data mining,” Jurnal Informasi dan Teknologi, vol. 5, no. 2, pp. 52–62, Jun. 2023, doi: 10.37034/jidt.v5i2.324.

R. M. West, “Best practice in statistics: The use of log transformation,” Ann Clin Biochem, vol. 59, no. 3, pp. 162–165, May 2022, doi: 10.1177/00045632211050531.

P. R. Sihombing, S. Suryadiningrat, D. A. Sunarjo, and Y. P. A. C. Yuda, “Identifikasi data outlier (pencilan) dan kenormalan data pada data univariat serta alternatif penyelesaiannya,” Jurnal Ekonomi dan Statistik Indonesia, vol. 2, no. 3, pp. 307–316, Jan. 2023, doi: 10.11594/jesi.02.03.07.

R. Ridwan, E. H. Hermaliani, and M. Ernawati, “Penerapan metode SMOTE untuk mengatasi imbalanced data pada klasifikasi ujaran kebencian,” Jurnal Co-Science, vol. 4, no. 1, Jan. 2024, doi: https://doi.org/10.31294/coscience.v4i1.2990.

V. R. Joseph, “Optimal ratio for data splitting,” Stat Anal Data Min, vol. 15, no. 4, pp. 531–538, Aug. 2022, doi: 10.1002/sam.11583.

H. Marlina, Elmayati, A. Zulius, and H. O. L. Wijaya, “Penerapan algoritma random forest dalam klasifikasi penjurusan di SMA Negeri Tugumulyo,” Brahmana: Jurnal Penerapan Kecerdasan Buatan, vol. 4, no. 2, pp. 138–143, Jun. 2023, doi: 10.30645/brahmana.v4i2.188.

R. H. Situngkir and P. Sembiring, “Analisis regresi logistik untuk menentukan faktor-faktor yang mempengaruhi kesejahteraan masyarakat kabupaten/kota di Pulau Nias,” FARABI: Jurnal Matematika dan Pendidikan Matematika, vol. 6, no. 1, pp. 25–31, May 2023, doi: 10.47662/farabi.v6i1.432.

S. R. Cholil, T. Handayani, R. Prathivi, and T. Ardianita, “Implementasi algoritma klasifikasi K-Nearest Neighbor (KNN) untuk klasifikasi seleksi penerima beasiswa,” IJCIT (Indonesian Journal on Computer and Information Technology), vol. 6, no. 2, pp. 118–127, Dec. 2021, doi: 10.31294/ijcit.v6i2.10438.

J. K. Nainggolan, F. Sinaga, A. M. Sitorus, A. Khairia, and B. A. Wijaya, “Analisa komparasi dengan algoritma K-Nearest Neighbor (KNN) dan Support Vector Machine (SVM) untuk prediksi penyakit jantung,” Dinamik, vol. 30, no. 2, pp. 297–306, Jul. 2025, doi: 10.35315/dinamik.v30i2.10254.

B. Sunarko et al., “Penerapan stacking ensemble learning untuk klasifikasi efek kesehatan akibat pencemaran udara,” Edu Komputika Journal, vol. 10, no. 1, pp. 55–63, Sep. 2023, doi: 10.15294/ edukomputika.v10i1.72080.

N. Rane, S. P. Choudhary, and J. Rane, “Ensemble deep learning and machine learning: applications, opportunities, challenges, and future directions,” Studies in Medical and Health Sciences, vol. 1, no. 2, pp. 18–41, Jul. 2024, doi: 10.48185/smhs.v1i2.1225.

P. Mahajan, S. Uddin, F. Hajati, and M. A. Moni, “Ensemble learning for disease prediction: A review”, Healthcare, vol. 11, no. 12, p. 1808, Jun. 2023, doi: 10.3390/healthcare11121808.

S. Raschka, “StackingClassifier: Simple stacking - mlxtend.” [Online]. Available: https://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/. [Accessed: July 31, 2025].

D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241–259, 1992, doi: 10.1016/S0893-6080(05)80023-1.

S. Raschka, “EnsembleVoteClassifier: A majority voting classifier – mlxtend.”, 2025. [Online]. Available: https://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/. [Accessed: July 31, 2025].

Information
PDF
23 times PDF : 5 times