Class Config

java.lang.Object
com.apple.foundationdb.async.hnsw.Config

public final class Config extends Object
Configuration settings for a HNSW.
  • Field Details

    • DEFAULT_DETERMINISTIC_SEEDING

      public static final boolean DEFAULT_DETERMINISTIC_SEEDING
      See Also:
    • DEFAULT_METRIC

      @Nonnull public static final Metric DEFAULT_METRIC
    • DEFAULT_USE_INLINING

      public static final boolean DEFAULT_USE_INLINING
      See Also:
    • DEFAULT_M

      public static final int DEFAULT_M
      See Also:
    • DEFAULT_M_MAX_0

      public static final int DEFAULT_M_MAX_0
      See Also:
    • DEFAULT_M_MAX

      public static final int DEFAULT_M_MAX
      See Also:
    • DEFAULT_EF_CONSTRUCTION

      public static final int DEFAULT_EF_CONSTRUCTION
      See Also:
    • DEFAULT_EXTEND_CANDIDATES

      public static final boolean DEFAULT_EXTEND_CANDIDATES
      See Also:
    • DEFAULT_KEEP_PRUNED_CONNECTIONS

      public static final boolean DEFAULT_KEEP_PRUNED_CONNECTIONS
      See Also:
    • DEFAULT_SAMPLE_VECTOR_STATS_PROBABILITY

      public static final double DEFAULT_SAMPLE_VECTOR_STATS_PROBABILITY
      See Also:
    • DEFAULT_MAINTAIN_STATS_PROBABILITY

      public static final double DEFAULT_MAINTAIN_STATS_PROBABILITY
      See Also:
    • DEFAULT_STATS_THRESHOLD

      public static final int DEFAULT_STATS_THRESHOLD
      See Also:
    • DEFAULT_USE_RABITQ

      public static final boolean DEFAULT_USE_RABITQ
      See Also:
    • DEFAULT_RABITQ_NUM_EX_BITS

      public static final int DEFAULT_RABITQ_NUM_EX_BITS
      See Also:
    • DEFAULT_MAX_NUM_CONCURRENT_NODE_FETCHES

      public static final int DEFAULT_MAX_NUM_CONCURRENT_NODE_FETCHES
      See Also:
    • DEFAULT_MAX_NUM_CONCURRENT_NEIGHBOR_FETCHES

      public static final int DEFAULT_MAX_NUM_CONCURRENT_NEIGHBOR_FETCHES
      See Also:
  • Method Details

    • isDeterministicSeeding

      public boolean isDeterministicSeeding()
      Indicator that if true causes the insert logic of the HNSW to be seeded using a hash of the primary key of the record that is inserted. That can be useful for testing. If isDeterministicSeeding is false, we use System.nanoTime() for seeding.
    • getMetric

      @Nonnull public Metric getMetric()
      The metric that is used to determine distances between vectors.
    • getNumDimensions

      public int getNumDimensions()
      The number of dimensions used. All vectors must have exactly this number of dimensions.
    • isUseInlining

      public boolean isUseInlining()
      Indicator if all layers except layer 0 use inlining. One entire layer is fully managed using either the compact or the inlining layout. If inlining is used, each node is persisted as a key/value pair per neighbor which includes the vectors of the neighbors but not the vector for itself. If inlining is not used, and therefore the compact layout is used instead, each node is persisted as exactly one key/value pair per node which stores its own vector but specifically excludes the vectors of the neighbors.

      If a layer uses the compact storage layout, each vector is stored with the node and therefore is stored exactly once. During a nearest neighbor search, a fetch of the neighborhood of a node incurs a fetch (get) for each of the neighbors of that node.

      If a layer uses the inlining storage layout, a vector of a node is stored with the neighboring information of an adjacent node pointing to this node and is therefore is stored multiple times (once per neighbor). During a nearest neighbor search, however, the neighboring vectors of a vector can all be fetched in one range scan.

      Choosing which storage format is right for the use case depends on some factors:

      • Inlining leaves uses more storage in the database as vectors are stored multiple times. However, since inlining is only supported for layers greater than 0, the overhead of storing more data should be acceptable in most use cases.
      • Inlining should outperform the compact storage model during search. Tests have shown a latency improvement of searches of roughly 5% - 10%. Insert performance is slightly decreased due to higher number of bytes that need to be written. Note that this overhead can be mitigated by using isUseRaBitQ() to drastically reduce the sizes of vectors on disk.
    • getM

      public int getM()
      This attribute (named M by the HNSW paper) is the connectivity value for all nodes stored on any layer. While by no means enforced or even enforceable, we strive to create and maintain exactly m neighbors for a node. Due to insert/delete operations it is possible that the actual number of neighbors a node references is not exactly m at any given time.
    • getMMax

      public int getMMax()
      This attribute (named M_max by the HNSW paper) is the maximum connectivity value for nodes stored on a layer greater than 0. We will never create more that mMax neighbors for a node. That means that we even prune the neighbors of a node if the actual number of neighbors would otherwise exceed mMax. Note that this attribute must be greater than or equal to m.
    • getMMax0

      public int getMMax0()
      This attribute (named M_max0 by the HNSW paper) is the maximum connectivity value for nodes stored on layer 0. We will never create more that mMax0 neighbors for a node that is stored on that layer. That means that we even prune the neighbors of a node if the actual number of neighbors would otherwise exceed mMax0. Note that this attribute must be greater than or equal to mMax.
    • getEfConstruction

      public int getEfConstruction()
      Maximum size of the search queues (one independent queue per layer) that are used during the insertion of a new node. If efConstruction is set to 1, the search naturally follows a greedy approach (monotonous descent), whereas a high number for efConstruction allows for a more nuanced search that can tolerate (false) local minima.
    • isExtendCandidates

      public boolean isExtendCandidates()
      Indicator to signal if, during the insertion of a node, the set of nearest neighbors of that node is to be extended by the actual neighbors of those neighbors to form a set of candidates that the new node may be connected to during the insert operation.
    • isKeepPrunedConnections

      public boolean isKeepPrunedConnections()
      Indicator to signal if, during the insertion of a node, candidates that have been discarded due to not satisfying the select-neighbor heuristic may get added back in to pad the set of neighbors if the new node would otherwise have too few neighbors (see m).
    • getSampleVectorStatsProbability

      public double getSampleVectorStatsProbability()
      If sampling is necessary (currently iff isUseRaBitQ() is true), this attribute represents the probability of a vector being inserted to also be written into the StorageAdapter.SUBSPACE_PREFIX_SAMPLES subspace. The vectors in that subspace are continuously aggregated until a total statsThreshold has been reached.
    • getMaintainStatsProbability

      public double getMaintainStatsProbability()
      If sampling is necessary (currently iff isUseRaBitQ() is true), this attribute represents the probability of the StorageAdapter.SUBSPACE_PREFIX_SAMPLES subspace to be further aggregated (rolled-up) when a new vector is inserted. The vectors in that subspace are continuously aggregated until a total statsThreshold has been reached.
    • getStatsThreshold

      public int getStatsThreshold()
      If sampling is necessary (currently iff isUseRaBitQ() is true), this attribute represents the threshold (being a number of vectors) that when reached causes the stats maintenance logic to compute the actual statistics (currently the centroid of the vectors that have been inserted to far).
    • isUseRaBitQ

      public boolean isUseRaBitQ()
      Indicator if we should RaBitQ quantization. See RaBitQuantizer for more details.
    • getRaBitQNumExBits

      public int getRaBitQNumExBits()
      Number of bits per dimensions iff isUseRaBitQ() is set to true, ignored otherwise. If RaBitQ encoding is used, a vector is stored using roughly 25 + numDimensions * (numExBits + 1) / 8 bytes.
    • getMaxNumConcurrentNodeFetches

      public int getMaxNumConcurrentNodeFetches()
      Maximum number of concurrent node fetches during search and modification operations.
    • getMaxNumConcurrentNeighborhoodFetches

      public int getMaxNumConcurrentNeighborhoodFetches()
      Maximum number of concurrent neighborhood fetches during modification operations when the neighbors are pruned.
    • toBuilder

      @Nonnull public Config.ConfigBuilder toBuilder()
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toString

      @Nonnull public String toString()
      Overrides:
      toString in class Object