Optimasi Hyperparameter Berbasis Bayesian dengan Optuna untuk Spectral Clustering - K-Means: Studi Kasus pada Dataset Leukemia CuMiDa

Authors

DOI:

https://doi.org/10.35760/tr.2025.v30i3.16

Keywords:

Bayesian Optimization, gene expression, K-Means, Optuna, Spectral Clustering

Abstract

Leukemia is one of the cancers with the highest mortality rate worldwide; therefore, identifying its subtypes is crucial to support accurate diagnosis and effective treatment. The analysis of high-dimensional gene expression data, such as the CuMiDa dataset, still faces major challenges due to overlapping patterns and limited sample sizes. This study proposes the application of Bayesian Optimization using Optuna to perform hyperparameter tuning on the Spectral Clustering – K-Means method to improve the clustering performance of leukemia subtypes. Four key parameters (n_components, affinity method, n_neighbors, and gamma) were optimized through 1,000 iterations. The best configuration was obtained at n_components = 5 using the Nearest Neighbors method with n_neighbors = 6. The resulting Spectral Embedding matrix was then grouped using K-Means. The results showed that this approach achieved a clustering accuracy of 92,19%, outperforming both K-Means and Hierarchical Clustering when applied separately. Heatmap visualization demonstrated that the optimized method effectively grouped samples with similar gene expression patterns. This study demonstrates that the combination of Spectral Clustering–K-Means and Bayesian optimization using Optuna can improve the clustering quality of complex gene expression data and open up broader opportunities for application in other bioinformatics studies.

Downloads

Download data is not yet available.

References

[1] M. Ilyas, K. M. Aamir, S. Manzoor, and M. Deriche, “Linear programming based computational technique for leukemia classification using gene expression profile,” PLoS One, vol. 18, no. 10 October, Oct. 2023, doi: 10.1371/journal.pone.0292172.

[2] F. Joanda Kaunang, B. Hakim, F. Fraderic, S. Hartono, and A. Kristanto Mulyanto, “Breast Cancer Detection using Decision Tree and Random Forest,” 2025. doi: https://doi.org/10.30871/jaic.v9i2.9073.

[3] B. C. Feltes, E. B. Chandelier, B. I. Grisci, and M. Dorn, “CuMiDa: An Extensively Curated Microarray Database for Benchmarking and Testing of Machine Learning Approaches in Cancer Research,” Journal of Computational Biology, vol. 26, no. 4, pp. 376–386, Apr. 2019, doi: 10.1089/cmb.2018.0238.

[4] J. Shen et al., “Deep learning approach for cancer subtype classification using high-dimensional gene expression data,” BMC Bioinformatics, vol. 23, no. 1, Dec. 2022, doi: 10.1186/s12859-022-04980-9.

[5] A. Gupta, H. Sharma, and A. Akhtar, “A Comparative Analysis of K-Means and Hierachichal Clustering,” EPRA International Journal of Multidisciplinary Research (IJMR)-Peer Reviewed Journal, no. 8, 2021, doi: 10.36713/epra2013.

[6] B. Hakim, F. Joanda Kaunang, C. Susanto, J. Salim, and R. Indradjaja, “Implementasi Machine Learning dalam Pengelompokkan Musik Menggunakan Algoritma K-Means Clustering,” 2025. doi: https://doi.org/10.36080/idealis.v8i1.3357.

[7] K. Berahmand, F. Saberi-Movahed, R. Sheikhpour, Y. Li, and M. Jalili, “A Comprehensive Survey on Spectral Clustering with Graph Structure Learning,” Jan. 2025, doi: https://doi.org/10.48550/arXiv.2501.13597.

[8] L. V. Xiaodan, “An Improved Automated Spectral Clustering Algorithm,” Journal of Information Processing Systems, vol. 20, no. 2, pp. 185–199, Apr. 2024, doi: 10.3745/JIPS.04.0307.

[9] F. Sadjadi, V. Torra, and M. Jamshidi, “Preprocessed Spectral Clustering with Higher Connectivity for Robustness in Real-World Applications,” International Journal of Computational Intelligence Systems, vol. 17, no. 1, Dec. 2024, doi: 10.1007/s44196-024-00455-2.

[10] E. Al-sharoa and S. Aviyente, “A Unified Spectral Clustering Approach for Detecting Community Structure in Multilayer Networks,” Symmetry (Basel), vol. 15, no. 7, Jul. 2023, doi: 10.3390/sym15071368.

[11] D. Y. Bernanda, D. N. A. Jawawi, S. A. Halim, and F. Adikara, “Natural Language Processing For Requirement Elicitation In University Using Kmeans And Meanshift Algorithm,” Baghdad Science Journal, vol. 21, no. 2, pp. 561–567, 2024, doi: 10.21123/bsj.2024.9675.

[12] M. Cendana and R. J. Kuo, “Categorical Data Clustering: A Bibliometric Analysis and Taxonomy,” Jun. 01, 2024, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/make6020047.

[13] X. Wang, Y. Jin, S. Schmitt, and M. Olhofer, “Recent Advances in Bayesian Optimization,” ACM Comput Surv, vol. 55, no. 13s, 2023, doi: 10.1145/3582078.

[14] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas, “Taking the human out of the loop: A review of Bayesian optimization,” Jan. 01, 2016, Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/JPROC.2015.2494218.

[15] S. Shekhar, A. Bansode, and A. Salim, “A Comparative study of Hyper-Parameter Optimization Tools,” Jan. 2022, doi: https://doi.org/10.48550/arXiv.2201.06433.

[16] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A Next-generation Hyperparameter Optimization Framework,” Jul. 2019, doi: https://doi.org/10.48550/arXiv.1907.10902.

[17] M. Ilyas, K. M. Aamir, S. Manzoor, and M. Deriche, “Linear programming based computational technique for leukemia classification using gene expression profile,” PLoS One, vol. 18, no. 10 October, Oct. 2023, doi: 10.1371/journal.pone.0292172.

[18] S. Selvaraj et al., “Super learner model for classifying leukemia through gene expression monitoring,” Discover Oncology, vol. 15, no. 1, Dec. 2024, doi: 10.1007/s12672-024-01337-x.

[19] B. C. Feltes, E. B. Chandelier, B. I. Grisci, and M. Dorn, “CuMiDa: An Extensively Curated Microarray Database,” SBCB Lab. Accessed: Oct. 26, 2025. [Online]. Available: https://sbcb.inf.ufrgs.br/cumida

[20] D. Yan, L. Huang, and M. I. Jordan, “Fast approximate spectral clustering,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009, pp. 907–915. doi: 10.1145/1557019.1557118.

[21] U. von Luxburg, “A Tutorial on Spectral Clustering,” Nov. 2007, [Online]. Available: http://arxiv.org/abs/0711.0189

[22] Z. Yi, Y. Wei, C. X. Cheng, K. He, and Y. Sui, “Improving sample efficiency of high dimensional Bayesian optimization with MCMC,” Jan. 2024, doi: https://doi.org/10.48550/arXiv.2401.02650.

[23] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A Practical Guide to Support Vector Classification,” 2003. Accessed: Oct. 23, 2025. [Online]. Available: https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

[24] M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: A comprehensive survey and performance evaluation,” Aug. 01, 2020, MDPI AG. doi: 10.3390/electronics9081295.

[25] A. A. Mousa, M. A. El-Shorbagy, and M. A. Farag, “K-means-clustering based evolutionary algorithm for multi-objective resource allocation problems,” Applied Mathematics and Information Sciences, vol. 11, no. 6, pp. 1681–1692, Nov. 2017, doi: 10.18576/amis/110615.

[26] S. Suresh Sikhakolli and A. Kiran Sikhakolli DrDY, “Effective Purity Method for Measuring the Clustering Accuracy and its Illustration,” NY, USA, May 2023. doi: 10.5120/ijca2023922752.

Downloads

Published

2025-12-31

Issue

Section

Articles

How to Cite

Optimasi Hyperparameter Berbasis Bayesian dengan Optuna untuk Spectral Clustering - K-Means: Studi Kasus pada Dataset Leukemia CuMiDa. (2025). Jurnal Ilmiah Teknologi Dan Rekayasa, 30(3), 215-228. https://doi.org/10.35760/tr.2025.v30i3.16

Similar Articles

You may also start an advanced similarity search for this article.