Package

io.radanalytics.silex

text

Permalink

package text

Visibility
  1. Public
  2. All

Type Members

  1. case class ApproximateWhitelist(filter: BitSet) extends Product with Serializable

    Permalink

    An ApproximateWhitelist is a basic Bloom filter intended for holding natural-language vocabularies.

    An ApproximateWhitelist is a basic Bloom filter intended for holding natural-language vocabularies. It deals with String values natively and can be trained from a sequence or from an RDD of any element type T, as long as there is an implicit conversion in scope from T to String.

    Known limitation: while this filter uses several hashes, some of these will exhibit unusually high collision rates when hashing strings that are permutations of one another. If you experience poor filter performance on a given vocabulary, this might be worth investigating. The choice of hash functions is subject to change in a future release.

  2. trait LogTokenizing extends AnyRef

    Permalink

Value Members

  1. object ApproximateWhitelist extends Serializable

    Permalink
  2. object LogTokenizer extends LogTokenizing

    Permalink

Ungrouped