Agung Riyadi
Universitas Nasional
Fauziah Fauziah
Universitas Nasional


High-dimensional data is very difficult to group, because the data growth is exponential in terms of data format and the number of values for each dimension is impossible to calculate. To increase the efficiency and accuracy of processing high-dimensional data, before testing, data cleaning and data reduction processes were carried out using the Principal Component Analysis (PCA) method. There are challenges in comparing the quality of clusters at different membership values resulting in different final clusters, this is due to the presence of noise and outliers in the processed data. Therefore, the PCA method can be used to reduce data sets from high-dimensional data to low-dimensional data and eliminate noise and outliers. The Fuzzy C-Means (FCM) method with different initialization is used to group data into clusters based on similar data, so that related data is placed in the same cluster. Based on this process, a comparison of the results of the method without PCA and with PCA was obtained and the results obtained from the implementation of PCA+FCM with the initialization of a multi-variate Gaussian distribution were higher with an accuracy level of 87.07.

PCA, FCM, clustering, multi-variate gaussian distribution

