This class provides an implemantation of CLARA (Clustering for Large Applications) algorithm.
An obvious way of clustering larger datasets is to try and extend existing methods so
that they can cope with a larger number of objects. The focus is on clustering large numbers
of objects rather than a small number of objects in high dimensions.
Kaufman and Rousseeuw (1990) suggested the CLARA (Clustering for Large Applications)
algorithm for tackling large applications. CLARA extends their k-medoids approach
or a large number of objects. It works by clustering a sample from the dataset and
then assigns all objects in the dataset to these clusters.
CLARA (CLustering LARge Applications) relies on the sampling approach to handle large data sets.
Instead of finding medoids for the entire data set, CLARA draws a small sample from the data set
and applies the PAM algorithm to generate an optimal set of medoids for the sample.
The quality of resulting medoids is measured by the average dissimilarity between every object
in the entire data set D and the medoid of its cluster
To alleviate sampling bias, CLARA repeats the sampling and clustering process a pre-defined
number of times and subsequently selects as the final clustering result the set of medoids
with the minimal cost.
property CacheCost as %Integer [ InitialExpression = -1 ];