Class PartitionEvaluator

java.lang.Object
com.apple.foundationdb.kmeans.PartitionEvaluator

public class PartitionEvaluator extends Object
Evaluates whether a candidate partitioning should replace a current partitioning. Works symmetrically across splits (more clusters in the candidate), merges (fewer clusters in the candidate), and same-k re-partitionings; the k == 1 case on either side is handled by treating missing per-cluster statistics (separation, low-margin rate) as neutral zero contributions to the composite score and by skipping separation/margin hard rejects when the candidate has fewer than two clusters.
  • Constructor Details

    • PartitionEvaluator

      public PartitionEvaluator()
  • Method Details

    • evaluate

      @Nonnull public static <V> PartitionEvaluator.EvaluationResult evaluate(@Nonnull List<V> currentVectors, @Nonnull PartitionEvaluator.Partition<?> current, @Nonnull List<V> candidateVectors, @Nonnull PartitionEvaluator.Partition<?> candidate, @Nonnull Lens<V,RealVector> vectorLens, @Nonnull PartitionEvaluator.Parameters parameters)
      Evaluates a candidate partitioning against the current one and returns whether to accept, keep, or reject it.

      The decision is made by computing a panel of quality statistics for both partitionings (see PartitionEvaluator.PartitionStats) and combining them into a composite score:

      
       scoreGain = alphaSseGain * relativeSseGain
                 + betaSeparationGain * separationGain
                 - gammaImbalancePenalty * imbalancePenalty
                 - deltaLowMarginPenalty * lowMarginPenalty
       
      Hard rejects are applied first (minSmallestFrac, maxLargestFrac, candidate separation/low-margin thresholds when candidate.k() >= 2, and the absolute minRelativeSseGain/minScoreGain floors); only if all of those pass is the candidate accepted.

      Symmetric handling: separation and low-margin rate are undefined when k < 2; this method treats them as 0 in the score formula and skips the corresponding hard rejects when candidate.k() < 2, so the same logic correctly handles splits, merges (including merges to a single cluster), and same-k re-partitionings.

      Type Parameters:
      V - caller's input vector representation
      Parameters:
      currentVectors - the vectors belonging to the current partitioning. Must be non-empty
      current - the current partitioning
      candidateVectors - the vectors belonging to the candidate partitioning. Often the same list as currentVectors but may differ when the caller is re-clustering a different point set
      candidate - the candidate partitioning to evaluate
      vectorLens - lens that extracts a RealVector from each currentVectors /candidateVectors element
      parameters - tuning parameters that control thresholds and score weights
      Returns:
      an PartitionEvaluator.EvaluationResult carrying the decision, both stats, and the metrics that led to the decision