Coordinate Rotation-Based Low ComplexityK-Means Clustering Architecture

Abstract:

In this brief, we propose a low-complexity architectural implementation of the K-means-based clustering algorithm used widely in mobile health monitoring applications for unsupervised and supervised learning. The iterative nature of the algorithm computing the distance of each data point from a respective centroid for a successful cluster formation until convergence presents a significant challenge to map it onto a low-power architecture. This has been addressed by the use of a 2-D Coordinate Rotation Digital Computer-based low-complexity engine for computing the n-dimensional Euclidean distance involved during clustering. The proposed clustering engine was synthesized using the TSMC 130-nm technology library, anda place and route was performed following which the core area and power were estimated as 0.36 mm2 and 9.21 mW at 100 MHz, respectively, making the design applicable for low-power real-time operations within a sensor node. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.

Existing System:

The fundamental concept of the cluster analysis is to form groupsof similar objects as a means of distinguishing them from eachother. Clustering techniques have been successfully used indiverse fields, such as medicine (EEG and activity recognition)and marketing, involving multivariate data and can be convenientlydeployed with limited resources (memory and CPU). TheK-means clustering algorithm owing to its computational simplicityand efficiency has been an attractive choice for a wide variety ofsignal processing applications. It is a well-perceived fact in theresearch community that the cluster analysis is primarily used forunsupervised learning, where the class labels for the training dataare not available. However, the K-means algorithm can also beused for supervised learning, where the class labels of the trainingdata are known apriori. Apart from using it as a learningalgorithm, K-means has also been utilized for signal preprocessing,feature reduction, and time-domain signal analysis. Hence, usingK-means for real-time cluster analysis requiring computation inresource-constrained sensor nodes for remote health care monitoringsystems, where online multimodal data acquisition and analysis isthe key (e.g., cardiovascular disease prognosis), requires an effective implementation strategy. The fundamental requirement for suchapplications is to cut down on continuous transmission and has alow-power operation to prolong their battery life. Hence, an effectivealgorithm-to-architecture holistic mapping is required to fulfill thenotion of low-power operation aimed for long durations.

The K-means algorithm exhibits an iterative nature, where itcomputes the distance of each data sample from the centroids untilconvergence. This is generally achieved by the use of power hungrymultipliers, square rooters (for Euclidean distance computation),and multiplexers, thereby rendering direct mapping of thisalgorithm to architecture infeasible for implementation on resourceconstrained platforms. An attempt was made to replace the Euclideandistance by a combination of Manhattan and Max distance butby trading off accuracy for power consumption. Therefore,an optimization is necessary to achieve a tradeoff between algorithm efficiency and architectural complexity. Coordinate RotationDigital Computer (CORDIC)-based architectures exploring its different transcendental functions to compute complex arithmetic operations have been widely used for computationally intensivesignal processing algorithms, which apply the K-meansclustering algorithm. Hence, in this brief, we investigate the useof a CORDIC-based low-complexity engine to implementK-meansclustering algorithm.

Disadvantages:

  • Complex design

Proposed System:

TheK-means algorithm iterates to minimize the squared errorbetween the empirical mean of a cluster and the individual datapoints, defined as the cost function. Initially,kcentroids are definedand data vectors are assigned to a cluster label depending on howclose they are to each centroid. The k centroids are recalculatedfrom the newly defined clusters, and the process of reassignmentof each data vector to each new centroid is repeated. The algorithmiterates over this loop until the data vectors form clusters and the cost function is minimized. CORDIC is an efficient implementationtechnique for vector rotation and arctangent computation using shiftadd operations. In this brief, we use CORDIC in vectoring mode forour implementation.The architecture of the proposed CORDIC-basedK-means clustering engine is shown in Fig. 1. The input data are stored inmemory and transmitted to different blocks via control unit (CU). TheEuclidean distance is calculated in distance unit using low complexityCORDIC vectoring module. The distances from each point to eachof the centroids are sent to a comparator block to identify the clusterto which it belongs. Once the clustering is done, centroid calculationblock will be activated to compute the new centroids. If these newcentroids differ significantly from the previous values of the iteration,clustering is repeated, else clustered data are sent to the output. TheCU governs the data flow a all the modules. The proposedengine utilizes CORDIC to compute Euclidean distance betweentwo points, which is a metric to compute the clusters and has beenexplained through an illustrative example.

Figure 1: Complete architecture of the proposed system

In this brief, our focus is to propose a methodology for utilizing CORDIC to compute Euclidean distance between two points,which is a metric to compute the clusters. In 2-D signal space,if(x1,x2)and(y1,y2)are two points, the Euclidean distance betweenthese two points will be

One square rooter, two square, one adder, and two subtractionoperations are involved in this computation. If we giveaandbas thex-andy-inputs to the vectoring mode CORDIC, the x output willbe the magnitude of vector (a,b),whichis(a2+b2)1/2. So, with(x1 −y1) and (x2 −y2) as the x-andy-inputs, respectively,the vectoring mode generates the distance between two points. Architecture of the 2-D distance measurement unit using vectoring mode CORDIC is shown in Fig. 2(a). We can extend this methodology ton-dimensional (nD) signal space to formulate distance betweentwo nD vectors.

Figure 2: CORDIC-based distance measurement. (a) 2-D vectoring. (b) 3-D vectoring. (c) 3-D multiplexed architecture. (d) nD multiplexed architecture

Advantages:

  • Design complexity is less

Software implementation:

  • Modelsim
  • Xilinx ISE