Class HNSW

java.lang.Object
com.apple.foundationdb.async.hnsw.HNSW

@API(EXPERIMENTAL) public class HNSW extends Object
An implementation of the Hierarchical Navigable Small World (HNSW) algorithm for efficient approximate nearest neighbor (ANN) search.

HNSW constructs a multi-layer graph, where each layer is a subset of the one below it. The top layers serve as fast entry points to navigate the graph, while the bottom layer contains all the data points. This structure allows for logarithmic-time complexity for search operations, making it suitable for large-scale, high-dimensional datasets.

This class provides methods for building the graph (insert(Transaction, Tuple, RealVector)) and performing k-NN searches (kNearestNeighborsSearch(ReadTransaction, int, int, boolean, RealVector)). It is designed to be used with a transactional storage backend, managed via a Subspace.

See Also:
  • Constructor Details

    • HNSW

      public HNSW(@Nonnull Subspace subspace, @Nonnull Executor executor, @Nonnull Config config, @Nonnull OnWriteListener onWriteListener, @Nonnull OnReadListener onReadListener)
      Constructs a new HNSW graph instance.

      This constructor initializes the HNSW graph with the necessary components for storage, execution, configuration, and event handling. All parameters are mandatory and must not be null.

      Parameters:
      subspace - the Subspace where the graph data is stored.
      executor - the Executor service to use for concurrent operations.
      config - the Config object containing HNSW algorithm parameters.
      onWriteListener - a listener to be notified of write events on the graph.
      onReadListener - a listener to be notified of read events on the graph.
      Throws:
      NullPointerException - if any of the parameters are null.
  • Method Details

    • newConfigBuilder

      public static Config.ConfigBuilder newConfigBuilder()
      Start building a Config.
      Returns:
      a new Config that can be altered and then built for use with a HNSW
      See Also:
    • defaultConfig

      @Nonnull public static Config defaultConfig(int numDimensions)
      Returns a default Config.
      Parameters:
      numDimensions - number of dimensions
      Returns:
      a new default Config.
      See Also:
    • getSubspace

      @Nonnull public Subspace getSubspace()
      Gets the subspace associated with this object.
      Returns:
      the non-null subspace
    • getExecutor

      @Nonnull public Executor getExecutor()
      Get the executor used by this hnsw.
      Returns:
      executor used when running asynchronous tasks
    • getConfig

      @Nonnull public Config getConfig()
      Get this hnsw's configuration.
      Returns:
      hnsw configuration
    • getOnWriteListener

      @Nonnull public OnWriteListener getOnWriteListener()
      Get the on-write listener.
      Returns:
      the on-write listener
    • getOnReadListener

      @Nonnull public OnReadListener getOnReadListener()
      Get the on-read listener.
      Returns:
      the on-read listener
    • kNearestNeighborsSearch

      @Nonnull public CompletableFuture<? extends List<? extends ResultEntry>> kNearestNeighborsSearch(@Nonnull ReadTransaction readTransaction, int k, int efSearch, boolean includeVectors, @Nonnull RealVector queryVector)
      Performs a k-nearest neighbors (k-NN) search for a given query vector.

      This method implements the search algorithm for an HNSW graph. The search begins at an entry point in the highest layer and greedily traverses down through the layers. In each layer, it finds the node closest to the queryVector. This node then serves as the entry point for the search in the layer below.

      Once the search reaches the base layer (layer 0), it performs a more exhaustive search starting from the determined entry point. It explores the graph, maintaining a dynamic list of the best candidates found so far. The size of this candidate list is controlled by the efSearch parameter. Finally, the method selects the top k nodes from the search results, sorted by their distance to the query vector.

      Parameters:
      readTransaction - the transaction to use for reading from the database
      k - the number of nearest neighbors to return
      efSearch - the size of the dynamic candidate list for the search. A larger value increases accuracy at the cost of performance.
      includeVectors - indicator if the caller would like the search to also include vectors in the result set
      queryVector - the vector to find the nearest neighbors for
      Returns:
      a CompletableFuture that will complete with a list of the k nearest neighbors, sorted by distance in ascending order.
    • insert

      @Nonnull public CompletableFuture<Void> insert(@Nonnull Transaction transaction, @Nonnull Tuple newPrimaryKey, @Nonnull RealVector newVector)
      Inserts a new vector with its associated primary key into the HNSW graph.

      The method first determines a random layer for the new node, called the insertionLayer. It then traverses the graph from the entry point downwards, greedily searching for the nearest neighbors to the newVector at each layer. This search identifies the optimal connection points for the new node.

      Once the nearest neighbors are found, the new node is linked into the graph structure at all layers up to its insertionLayer. Special handling is included for inserting the first-ever node into the graph or when a new node's layer is higher than any existing node, which updates the graph's entry point. All operations are performed asynchronously.

      Parameters:
      transaction - the Transaction context for all database operations
      newPrimaryKey - the unique Tuple primary key for the new node being inserted
      newVector - the RealVector data to be inserted into the graph
      Returns:
      a CompletableFuture that completes when the insertion operation is finished