Class HNSW

java.lang.Object
com.apple.foundationdb.async.hnsw.HNSW

@API(EXPERIMENTAL) public class HNSW extends Object
An implementation of the Hierarchical Navigable Small World (HNSW) algorithm for efficient approximate nearest neighbor (ANN) search.

HNSW constructs a multi-layer graph, where each layer is a subset of the one below it. The top layers serve as fast entry points to navigate the graph, while the bottom layer contains all the data points. This structure allows for logarithmic-time complexity for search operations, making it suitable for large-scale, high-dimensional datasets.

This class provides methods for building the graph (insert(Transaction, Tuple, RealVector)) and performing k-NN searches (kNearestNeighborsSearch(ReadTransaction, int, int, boolean, RealVector)). It is designed to be used with a transactional storage backend, managed via a Subspace.

This class functions as the entry point for any interactions with the HNSW data structure. It delegates to the respective operations classes which implement the actual algorithms to maintain and to search the structure.

See Also:
  • Constructor Details

    • HNSW

      public HNSW(@Nonnull Subspace subspace, @Nonnull Executor executor, @Nonnull Config config, @Nonnull OnWriteListener onWriteListener, @Nonnull OnReadListener onReadListener)
      Constructs a new HNSW graph instance.

      This constructor initializes the HNSW graph with the necessary components for storage, execution, configuration, and event handling. All parameters are mandatory and must not be null.

      Parameters:
      subspace - the Subspace where the graph data is stored.
      executor - the Executor service to use for concurrent operations.
      config - the Config object containing HNSW algorithm parameters.
      onWriteListener - a listener to be notified of write events on the graph.
      onReadListener - a listener to be notified of read events on the graph.
      Throws:
      NullPointerException - if any of the parameters are null.
  • Method Details

    • newConfigBuilder

      public static Config.ConfigBuilder newConfigBuilder()
      Start building a Config.
      Returns:
      a new Config that can be altered and then built for use with a HNSW
      See Also:
    • defaultConfig

      @Nonnull public static Config defaultConfig(int numDimensions)
      Returns a default Config.
      Parameters:
      numDimensions - number of dimensions
      Returns:
      a new default Config.
      See Also:
    • getLocator

      @Nonnull public Locator getLocator()
    • getSubspace

      @Nonnull public Subspace getSubspace()
      Gets the subspace associated with this object.
      Returns:
      the non-null subspace
    • getExecutor

      @Nonnull public Executor getExecutor()
      Get the executor used by this hnsw.
      Returns:
      executor used when running asynchronous tasks
    • getConfig

      @Nonnull public Config getConfig()
      Get this hnsw's configuration.
      Returns:
      hnsw configuration
    • getOnWriteListener

      @Nonnull public OnWriteListener getOnWriteListener()
      Get the on-write listener.
      Returns:
      the on-write listener
    • getOnReadListener

      @Nonnull public OnReadListener getOnReadListener()
      Get the on-read listener.
      Returns:
      the on-read listener
    • kNearestNeighborsSearch

      @Nonnull public CompletableFuture<List<? extends ResultEntry>> kNearestNeighborsSearch(@Nonnull ReadTransaction readTransaction, int k, int efSearch, boolean includeVectors, @Nonnull RealVector queryVector)
      Performs a search for the k-nearest neighbors for a given query vector.
      Parameters:
      readTransaction - the transaction to use for reading from the database
      k - the number of nearest neighbors to return
      efSearch - the size of the dynamic candidate list for the search. A larger value increases accuracy at the cost of performance.
      includeVectors - indicator if the caller would like the search to also include vectors in the result set
      queryVector - the vector to find the nearest neighbors for
      Returns:
      a CompletableFuture that will complete with a list of the k nearest neighbors, sorted by distance in ascending order.
    • kNearestNeighborsRingSearch

      @Nonnull public CompletableFuture<List<? extends ResultEntry>> kNearestNeighborsRingSearch(@Nonnull ReadTransaction readTransaction, int k, int efSearch, boolean includeVectors, @Nonnull RealVector queryVector, double radius)
      Performs a search for the k-nearest neighbors of a ring around a given query vector at a given radius.
      Parameters:
      readTransaction - the transaction to use for reading from the database
      k - the number of nearest neighbors to return
      efSearch - the size of the dynamic candidate list for the search. A larger value increases accuracy at the cost of performance.
      includeVectors - indicator if the caller would like the search to also include vectors in the result set
      queryVector - the vector to find the nearest neighbors for
      Returns:
      a CompletableFuture that will complete with a list of the k nearest neighbors, sorted by distance in ascending order.
    • insert

      @Nonnull public CompletableFuture<Void> insert(@Nonnull Transaction transaction, @Nonnull Tuple newPrimaryKey, @Nonnull RealVector newVector)
      Inserts a new vector with its associated primary key into the HNSW graph.

      The method first determines a layer for the new node, called the top layer. It then traverses the graph from the entry point downwards, greedily searching for the nearest neighbors to the newVector at each layer. This search identifies the optimal connection points for the new node.

      Once the nearest neighbors are found, the new node is linked into the graph structure at all layers up to its top layer. Special handling is included for inserting the first-ever node into the graph or when a new node's layer is higher than any existing node, which updates the graph's entry point. All operations are performed asynchronously.

      Parameters:
      transaction - the Transaction context for all database operations
      newPrimaryKey - the unique Tuple primary key for the new node being inserted
      newVector - the RealVector data to be inserted into the graph
      Returns:
      a CompletableFuture that completes when the insertion operation is finished
    • delete

      @Nonnull public CompletableFuture<Void> delete(@Nonnull Transaction transaction, @Nonnull Tuple primaryKey)
      Deletes a record using its associated primary key from the HNSW graph.

      This method implements a multi-layer deletion algorithm that maintains the structural integrity of the HNSW graph. The deletion process consists of several key phases:

      • Layer Determination: First determines the top layer for the node using the same deterministic algorithm used during insertion, ensuring consistent layer assignment across operations.
      • Existence Verification: Checks whether the node actually exists in the graph before attempting deletion. If the node doesn't exist, the operation completes immediately without error.
      • Multi-Layer Deletion: Removes the node from all layers spanning from layer 0 (base layer containing all nodes) up to and including the node's top layer. The deletion is performed in parallel across all layers for optimal performance.
      • Graph Repair: For each layer where the node is deleted, the algorithm repairs the local graph structure by identifying the deleted node's neighbors and reconnecting them appropriately. This process:
        • Finds candidate replacement connections among the neighbors of neighbors
        • Selects optimal new connections using the HNSW distance heuristics
        • Updates neighbor lists to maintain graph connectivity and search performance
        • Applies connection limits (M, MMax) and prunes excess connections if necessary
      • Entry Point Management: If the deleted node was serving as the graph's entry point (the starting node for search operations), the method automatically selects a new entry point from the remaining nodes at the highest available layer. If no nodes remain after deletion, the access information is cleared, effectively resetting the graph to an empty state.
      All operations are performed transactionally and asynchronously, ensuring consistency and enabling non-blocking execution in concurrent environments.
      Parameters:
      transaction - the Transaction context for all database operations, ensuring atomicity and consistency of the deletion and repair operations
      primaryKey - the unique Tuple primary key identifying the node to be deleted from the graph
      Returns:
      a CompletableFuture that completes when the deletion operation is fully finished, including all graph repairs and entry point updates. The future completes with null on successful deletion.
    • orderByDistance

      @Nonnull public AsyncIterator<ResultEntry> orderByDistance(@Nonnull ReadTransaction readTransaction, int efRingSearch, int efOutwardSearch, boolean includeVectors, @Nonnull RealVector centerVector, double minimumRadius, @Nullable Tuple minimumPrimaryKey)
      Returns an async iterator that returns results ordered by their distance from a given center vector.

      This method initiates an outward traversal from the centerVector, effectively performing a k-NN (k-Nearest Neighbor) or beam search. The results are returned as an AsyncIterator, with items yielded in increasing order of their distance from the center. The search can be started or resumed from a specific point defined by minimumRadius and minimumPrimaryKey, allowing for pagination.

      Parameters:
      readTransaction - the transaction to use for reading data
      efRingSearch - the exploration factor for the initial ring search phase of the HNSW algorithm
      efOutwardSearch - the exploration factor for the main outward search phase, determining the size of the candidate queue
      includeVectors - a boolean flag indicating whether the full vectors should be reconstructed and included in the results. If false, the vector in each ResultEntry will be null.
      centerVector - the vector to search around. Results will be ordered by their distance to this vector
      minimumRadius - the minimum distance from the centerVector. Only results with a distance greater than will be returned.
      minimumPrimaryKey - the primary key of the last item from a previous scan, used for pagination. If provided along with minimumRadius, the scan will resume after the item with this key at that radius. Can be null to start from the beginning.
      Returns:
      an AsyncIterator of ResultEntry objects, ordered by increasing distance from the centerVector