IMPLEMENTATION OF TEXT CLASSIFICATION ON USER REVIEWS IN DANA APPLICATION USING SUPPORT VECTOR MACHINE (SVM) AND GAUSSIAN NAÃVE BAYES (GNB)

Alfatha Fitrah Insan; Detty Purnamasari

Authors

Alfatha Fitrah Insan
Detty Purnamasari

Abstract

Conventional methods or devices are unable to efficiently process large volumes and diverse categories of information, which are collectively referred to as big data. Text mining is a commonly used technique for analyzing big data. This study evaluates the effectiveness of Support Vector Machines (SVM) and Gaussian NaÃ¯ve Bayes (GNB) in the classification of user reviews from the DANA application, obtained from the Google Play Store. The four fundamental phases of the investigation are data collection, data preparation, data modelling, and evaluation. This study utilized a dataset of 15.451 user reviews, dividing it into three subsets with varying data sizes and each subset having varying training-to-testing ratios. The evaluation will calculate four measurements, which are accuracy, precision, recall, F1-Score, and ROC Curve. The results illustrate that SVM and GNB achieved accuracy rates of at least 75%. SVM achieves an average accuracy of 84%, 88%, and 91%, while GNB achieves an average accuracy of 71%, 81%, and 85%. Based on the implementation results, sentiment analysis is more effective when performed with SVMÂ thanÂ withÂ GNB.

References

V. A. and S. S. Sonawane, â€œSentiment analysis of twitter data: a survey of techniques,â€ Int. J. Comput. Appl., vol. 139, no. 11, pp. 5â€“15, 2016, doi: 10.5120/ijca2016908625.

H. Hu, Y. Wen, T. S. Chua, and X. Li, â€œToward scalable systems for big data analytics: A technology tutorial,â€ IEEE Access, vol. 2, pp. 652â€“687, 2014, doi: 10.1109/ACCESS.2014.2332453.

A. M. Simanjuntak, S. Thamrin, and S. Sundari, â€œThe influence of big data analytics on human resource management strategies for company sustainabilityâ€, [Online]. Available: https://e-conf.usd.ac.id/index.php/icebmr/

K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, â€œText classification algorithms: A survey,â€ Inf., vol. 10, no. 4, pp. 1â€“68, 2019, doi: 10.3390/info10040150.

S. Weiss, N. Indurkhya, T. Zhang, and F. Damerau, Text mining: predictive methods for analyzing unstructured information. 2004. doi: 10.1007/978-0-387-34555-0.

A. F. Hidayatullah and S. N. Azhari, â€œAnalisis sentimen dan klasifikasi kategori terhadap tokoh publik pada data twitter menggunakan naive bayes classifier,â€ vol. 2016, no. semnasIF, pp. 1â€“8, 2016.

G. A. Buntoro, â€œAnalisis sentimen calon gubernur DKI Jakarta 2017 di twitter,â€ INTEGER J. Inf. Technol., vol. 2, no. 1, pp. 32â€“41, 2017, doi: 10.31284/j.integer.2017.v2i1.95.

V. Chandani and R. S. Wahono, â€œKomparasi algoritma klasifikasi machine learning dan feature selection pada analisis sentimen review film,â€ J. Intell. Syst., vol. 1, no. 1, pp. 55â€“59, 2015.

I. Hmeidi, M. Al-Ayyoub, N. A. Abdulla, A. A. Almodawar, R. Abooraig, and N. A. Mahyoub, â€œAutomatic Arabic text categorization: a comprehensive comparative study,â€ J. Inf. Sci., vol. 41, no. 1, pp. 114â€“124, 2015, doi: 10.1177/0165551514558172.

A. S. Neogi, K. A. Garg, R. K. Mishra, and Y. K. Dwivedi, â€œSentiment analysis and classification of Indian farmersâ€™ protest using twitter data,â€ Int. J. Inf. Manag. Data Insights, vol. 1, no. 2, p. 100019, 2021, doi: 10.1016/j.jjimei.2021.100019.

T. N. Prakash and A. Aloysius, â€œData preprocessing in sentiment analysis using twitter data,â€ Int. Educ. Appl. Res. J., vol. 3, no. 07, pp. 89â€“92, 2019, [Online]. Available: https://www.researchgate.net/publication/334670363

S. Fahmi, L. Purnamawati, G. F. Shidik, M. Muljono, and A. Z. Fanani, â€œSentiment analysis of student review in learning management system based on sastrawi stemmer and SVM-PSO,â€ Proc. - 2020 Int. Semin. Appl. Technol. Inf. Commun. IT Challenges Sustain. Scalability, Secur. Age Digit. Disruption, iSemantic 2020, pp. 643â€“648, 2020, doi: 10.1109/iSemantic50169.2020.9234291.

G. A. Dalaorao, A. M. Sison, and R. P. Medina, â€œIntegrating collocation as TF-IDF enhancement to improve classification accuracy,â€ TSSA 2019 - 13th Int. Conf. Telecommun. Syst. Serv. Appl. Proc., pp. 282â€“285, 2019, doi: 10.1109/TSSA48701.2019.8985458.

S. W. Kim and J. M. Gil, â€œResearch paper classification systems based on TF-IDF and LDA schemes,â€ Human-centric Comput. Inf. Sci., vol. 9, no. 1, 2019, doi: 10.1186/s13673-019-0192-7.

K. Zishumba, â€œSentiment analysis based on social media data,â€ J. Inf. Telecommun., pp. 1â€“48, 2019, [Online]. Available: http://repository.aust.edu.ng/xmlui/bitstream/handle/123456789/4901/Kudzai Zishumba.pdf?sequence=1&isAllowed=y

Y. N. Kunang and W. P. Mentari, â€œAnalysis of the impact of vectorization methods on machine learning-based sentiment analysis of tweets regarding readiness for offline learning,â€ JUITA J. Inform., vol. 11, no. 2, p. 271, 2023, doi: 10.30595/juita.v11i2.17568.

F. S. Nahm, â€œROC Curve: overview and practical use for clinicians,â€ Korean J. Anesthesiol., vol. 75, no. 1, pp. 25â€“36, 2022.

D. Marutho, S. Handaka, E. Wijaya, and M. Muljono, â€œThe determination of cluster number at k-mean using elbow method and purity evaluation on headline news,â€ 2018 Int. Semin. Appl. Technol. Inf. Commun., pp. 533â€“538, 2018, doi: 10.1109/ISEMANTIC.2018.8549751.

IMPLEMENTATION OF TEXT CLASSIFICATION ON USER REVIEWS IN DANA APPLICATION USING SUPPORT VECTOR MACHINE (SVM) AND GAUSSIAN NAÃVE BAYES (GNB)

Authors

Abstract

References

Downloads

Published

Issue

Section