cluster

Type Members

class ClusteringRandomForestModel extends Serializable

Enhance Spark RandomForestModel objects with methods for Random Forest Clustering
class ClusteringTreeModel extends Serializable

Enhance a Spark DecisionTreeModel object with methods for Random Forest clustering
case class KMedoids[T](metric: (T, T) ⇒ Double, k: Int, maxIterations: Int, epsilon: Double, fractionEpsilon: Double, sampleSize: Int, numThreads: Int, seed: Long) extends Serializable with Logging with Product

An object for training a K-Medoid clustering model on Seq or RDD data.
An object for training a K-Medoid clustering model on Seq or RDD data.
Data is required to have a metric function defined on it, but it does not require an algebra over data elements, as K-Means clustering does.
metric
The distance metric imposed on data elements
k
The number of clusters to use. If k is zero, the clustering will attempt to identify a number of clusters that is "good" w.r.t. Minimum Description Length.
maxIterations
The maximum number of model refinement iterations to run
epsilon
The epsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) <= epsilon (Lower cost is better).
fractionEpsilon
The fractionEpsilon threshold to use. Must be >= 0. If c1 is the current clustering model cost, and c0 is the cost of the previous model, then refinement halts when (c0 - c1) / c0 <= fractionEpsilon (Lower cost is better).
sampleSize
The target size of the random sample. Must be > 0.
numThreads
The number of threads to use while clustering
seed
The random seed to use for RNG. Cluster training runs with the same starting random seed will be the same. By default, training runs will vary randomly.
class KMedoidsModel[T] extends Serializable

Represents a K-Medoids clustering model
case class RandomForestCluster[T](extractor: (T) ⇒ Seq[Double], categoryInfo: Map[Int, Int], syntheticSS: Int, rfNumTrees: Int, rfMaxDepth: Int, rfMaxBins: Int, clusterK: Int, clusterMaxIter: Int, clusterEps: Double, clusterFractionEps: Double, clusterSS: Int, clusterThreads: Int, seed: Long) extends Serializable with Logging with Product

An object for training a Random Forest clustering model on unsupervised data.
An object for training a Random Forest clustering model on unsupervised data.
Data is required to have a mapping into a feature space of type Seq[Double].
extractor
A feature extraction function for data objects
categoryInfo
A map from feature indexes into numbers of categories. Feature indexes that do not have an entry in the map are assumed to be numeric, not categorical. Defaults to category-info from Extractor, if the feature extraction function is of this type. Otherwise defaults to empty, i.e. all numeric features.
syntheticSS
The size of synthetic (margin-sampled) data to be constructed. Defaults to the size of the input data.
rfNumTrees
The number of decision trees to train in the Random Forest Defaults to 10.
rfMaxDepth
Maximum decision tree depth. Defaults to 5.
rfMaxBins
Maximum histogramming bins to use for numeric data. Defaults to 5.
clusterK
The number of clusters to use when clustering leaf-id vectors. Defaults to an automatic estimation of a "good" number of clusters.
clusterMaxIter
Maximum clustering refinement iterations to compute. Defaults to 25.
clusterEps
Halt clustering if clustering metric-cost changes by less than this value. Defaults to 0
clusterFractionEps
Halt clustering if clustering metric-cost changes by this fractional value from previous iteration. Defaults to 0.0001
clusterSS
If data is larger, use this random sample size. Defaults to 1000.
clusterThreads
Use this number of threads to accelerate clustering. Defaults to 1.
seed
A seed to use for RNG. Defaults to using a randomized seed value.
class RandomForestClusterModel[T] extends Serializable

Represents a Random Forest clustering model of some data objects

Value Members

object ClusteringRandomForestModel extends Serializable
object ClusteringTreeModel extends Serializable

Class definitions for ClusteringTreeModel methods
object KMedoids extends Logging with Serializable

Utilities used by K-Medoids clustering
object KMedoidsModel extends Serializable

Utility functions for KMedoidsModel
object RandomForestCluster extends Serializable

Factory functions and implicits for RandomForestCluster
object RandomForestClusterModel extends Serializable

Factory functions and implicits for RandomForestClusterModel
package infra

package cluster

Type Members

class ClusteringRandomForestModel extends Serializable

class ClusteringTreeModel extends Serializable

case class KMedoids[T](metric: (T, T) ⇒ Double, k: Int, maxIterations: Int, epsilon: Double, fractionEpsilon: Double, sampleSize: Int, numThreads: Int, seed: Long) extends Serializable with Logging with Product

class KMedoidsModel[T] extends Serializable

class RandomForestClusterModel[T] extends Serializable

Value Members

object ClusteringRandomForestModel extends Serializable

object ClusteringTreeModel extends Serializable

object KMedoids extends Logging with Serializable

object KMedoidsModel extends Serializable

object RandomForestCluster extends Serializable

object RandomForestClusterModel extends Serializable

package infra

Ungrouped