This study aims to detect types of Cancer of unknown primary in big data using the MapReduce model. In this research, we want to integrate two calculation methods, i.e., using the K-means clustering algorithm in the parallel programming (MapReduce) text, so that we can acceptably achieve the minimum difference between the actual number of classes in the dataset and the number of recycled clusters. The Rand index is the validation criterion. This research used standard Euclidean Similarity (EZ0) and Cosine Similarity (C) criteria. A total of 18 Rand indices were obtained by running the algorithm on 18 datasets, analyzing the number of samples in each cluster, and comparing them with the real classes in each dataset. Accordingly, 18 Rand indices were obtained, and the result of this operation was monitored with the Euclidean similarity. This result was extracted by averaging the 18 mentioned indices between the FMG and K-Means method compared to the output.
|