The distance metric imposed on data elements
The number of clusters to use. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.
The maximum number of model refinement iterations to run
The epsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).
The fractionEpsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).
The target size of the random sample. Must be > 0.
The number of threads to use while clustering
The random seed to use for RNG. Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.
The epsilon threshold to use.
The epsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).
The fractionEpsilon threshold to use.
The fractionEpsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).
The number of clusters to use.
The number of clusters to use. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.
The maximum number of model refinement iterations to run
The distance metric imposed on data elements
The number of threads to use while clustering
Perform a K-Medoid clustering model training run on some input data
Perform a K-Medoid clustering model training run on some input data
The input data to train the clustering model on.
A KMedoidsModel object representing the clustering model.
Perform a K-Medoid clustering model training run on some input data
Perform a K-Medoid clustering model training run on some input data
The input data to train the clustering model on.
A KMedoidsModel object representing the clustering model.
The target size of the random sample.
The target size of the random sample. Must be > 0.
The random seed to use for RNG.
The random seed to use for RNG. Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.
Set epsilon halting threshold for clustering cost improvement between refinements.
Set epsilon halting threshold for clustering cost improvement between refinements.
If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).
The epsilon threshold to use. Must be >= 0.
Copy of this instance, with updated value of epsilon
Set fractionEpsilon threshold for clustering cost improvement between refinements.
Set fractionEpsilon threshold for clustering cost improvement between refinements.
If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).
The fractionEpsilon threshold to use. Must be >= 0.
Copy of this instance, with updated fractionEpsilon setting
Set the number of clusters to train
Set the number of clusters to train
The number of clusters. Must be >= 0. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.
Copy of this instance with new value for k
Set the maximum number of iterations to allow before halting cluster refinement.
Set the maximum number of iterations to allow before halting cluster refinement.
The maximum number of refinement iterations. Must be > 0.
Copy of this instance, with updated value for maxIterations
Set the distance metric to use over data elements
Set the distance metric to use over data elements
The distance metric
Copy of this instance with new metric
Set the number of threads to use for clustering runs
Set the number of threads to use for clustering runs
The number of threads to use while clustering. Must be > 0.
Copy of this instance with updated value of numThreads
Set the size of the random sample to take from input data to use for clustering.
Set the size of the random sample to take from input data to use for clustering.
The target size of the random sample. Must be > 0.
Copy of this instance, with updated value of sampleSize
Set the random number generation (RNG) seed.
Set the random number generation (RNG) seed.
Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.
The random seed to use for RNG
Copy of this instance, with updated random seed
An object for training a K-Medoid clustering model on Seq or RDD data.
Data is required to have a metric function defined on it, but it does not require an algebra over data elements, as K-Means clustering does.
The distance metric imposed on data elements
The number of clusters to use. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.
The maximum number of model refinement iterations to run
The epsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).
The fractionEpsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).
The target size of the random sample. Must be > 0.
The number of threads to use while clustering
The random seed to use for RNG. Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.