Class HNSW
HNSW constructs a multi-layer graph, where each layer is a subset of the one below it. The top layers serve as fast entry points to navigate the graph, while the bottom layer contains all the data points. This structure allows for logarithmic-time complexity for search operations, making it suitable for large-scale, high-dimensional datasets.
This class provides methods for building the graph (insert(Transaction, Tuple, RealVector))
and performing k-NN searches (kNearestNeighborsSearch(ReadTransaction, int, int, boolean, RealVector)).
It is designed to be used with a transactional storage backend, managed via a Subspace.
This class functions as the entry point for any interactions with the HNSW data structure. It delegates to the respective operations classes which implement the actual algorithms to maintain and to search the structure.
-
Constructor Summary
ConstructorsConstructorDescriptionHNSW(Subspace subspace, Executor executor, Config config, OnWriteListener onWriteListener, OnReadListener onReadListener) Constructs a new HNSW graph instance. -
Method Summary
Modifier and TypeMethodDescriptionstatic ConfigdefaultConfig(int numDimensions) Returns a defaultConfig.delete(Transaction transaction, Tuple primaryKey) Deletes a record using its associated primary key from the HNSW graph.Get this hnsw's configuration.Get the executor used by this hnsw.Get the on-read listener.Get the on-write listener.Gets the subspace associated with this object.insert(Transaction transaction, Tuple newPrimaryKey, RealVector newVector) Inserts a new vector with its associated primary key into the HNSW graph.CompletableFuture<List<? extends ResultEntry>>kNearestNeighborsRingSearch(ReadTransaction readTransaction, int k, int efSearch, boolean includeVectors, RealVector queryVector, double radius) Performs a search for the k-nearest neighbors of a ring around a given query vector at a given radius.CompletableFuture<List<? extends ResultEntry>>kNearestNeighborsSearch(ReadTransaction readTransaction, int k, int efSearch, boolean includeVectors, RealVector queryVector) Performs a search for the k-nearest neighbors for a given query vector.static Config.ConfigBuilderStart building aConfig.orderByDistance(ReadTransaction readTransaction, int efRingSearch, int efOutwardSearch, boolean includeVectors, RealVector centerVector, double minimumRadius, Tuple minimumPrimaryKey) Returns an async iterator that returns results ordered by their distance from a given center vector.
-
Constructor Details
-
HNSW
public HNSW(@Nonnull Subspace subspace, @Nonnull Executor executor, @Nonnull Config config, @Nonnull OnWriteListener onWriteListener, @Nonnull OnReadListener onReadListener) Constructs a new HNSW graph instance.This constructor initializes the HNSW graph with the necessary components for storage, execution, configuration, and event handling. All parameters are mandatory and must not be null.
- Parameters:
subspace- theSubspacewhere the graph data is stored.executor- theExecutorservice to use for concurrent operations.config- theConfigobject containing HNSW algorithm parameters.onWriteListener- a listener to be notified of write events on the graph.onReadListener- a listener to be notified of read events on the graph.- Throws:
NullPointerException- if any of the parameters arenull.
-
-
Method Details
-
newConfigBuilder
Start building aConfig.- Returns:
- a new
Configthat can be altered and then built for use with aHNSW - See Also:
-
defaultConfig
Returns a defaultConfig.- Parameters:
numDimensions- number of dimensions- Returns:
- a new default
Config. - See Also:
-
getLocator
-
getSubspace
Gets the subspace associated with this object.- Returns:
- the non-null subspace
-
getExecutor
Get the executor used by this hnsw.- Returns:
- executor used when running asynchronous tasks
-
getConfig
Get this hnsw's configuration.- Returns:
- hnsw configuration
-
getOnWriteListener
Get the on-write listener.- Returns:
- the on-write listener
-
getOnReadListener
Get the on-read listener.- Returns:
- the on-read listener
-
kNearestNeighborsSearch
@Nonnull public CompletableFuture<List<? extends ResultEntry>> kNearestNeighborsSearch(@Nonnull ReadTransaction readTransaction, int k, int efSearch, boolean includeVectors, @Nonnull RealVector queryVector) Performs a search for the k-nearest neighbors for a given query vector.- Parameters:
readTransaction- the transaction to use for reading from the databasek- the number of nearest neighbors to returnefSearch- the size of the dynamic candidate list for the search. A larger value increases accuracy at the cost of performance.includeVectors- indicator if the caller would like the search to also include vectors in the result setqueryVector- the vector to find the nearest neighbors for- Returns:
- a
CompletableFuturethat will complete with a list of theknearest neighbors, sorted by distance in ascending order.
-
kNearestNeighborsRingSearch
@Nonnull public CompletableFuture<List<? extends ResultEntry>> kNearestNeighborsRingSearch(@Nonnull ReadTransaction readTransaction, int k, int efSearch, boolean includeVectors, @Nonnull RealVector queryVector, double radius) Performs a search for the k-nearest neighbors of a ring around a given query vector at a given radius.- Parameters:
readTransaction- the transaction to use for reading from the databasek- the number of nearest neighbors to returnefSearch- the size of the dynamic candidate list for the search. A larger value increases accuracy at the cost of performance.includeVectors- indicator if the caller would like the search to also include vectors in the result setqueryVector- the vector to find the nearest neighbors for- Returns:
- a
CompletableFuturethat will complete with a list of theknearest neighbors, sorted by distance in ascending order.
-
insert
@Nonnull public CompletableFuture<Void> insert(@Nonnull Transaction transaction, @Nonnull Tuple newPrimaryKey, @Nonnull RealVector newVector) Inserts a new vector with its associated primary key into the HNSW graph.The method first determines a layer for the new node, called the
top layer. It then traverses the graph from the entry point downwards, greedily searching for the nearest neighbors to thenewVectorat each layer. This search identifies the optimal connection points for the new node.Once the nearest neighbors are found, the new node is linked into the graph structure at all layers up to its
top layer. Special handling is included for inserting the first-ever node into the graph or when a new node's layer is higher than any existing node, which updates the graph's entry point. All operations are performed asynchronously.- Parameters:
transaction- theTransactioncontext for all database operationsnewPrimaryKey- the uniqueTupleprimary key for the new node being insertednewVector- theRealVectordata to be inserted into the graph- Returns:
- a
CompletableFuturethat completes when the insertion operation is finished
-
delete
@Nonnull public CompletableFuture<Void> delete(@Nonnull Transaction transaction, @Nonnull Tuple primaryKey) Deletes a record using its associated primary key from the HNSW graph.This method implements a multi-layer deletion algorithm that maintains the structural integrity of the HNSW graph. The deletion process consists of several key phases:
- Layer Determination: First determines the top layer for the node using the same deterministic algorithm used during insertion, ensuring consistent layer assignment across operations.
- Existence Verification: Checks whether the node actually exists in the graph before attempting deletion. If the node doesn't exist, the operation completes immediately without error.
- Multi-Layer Deletion: Removes the node from all layers spanning from layer 0 (base layer containing all nodes) up to and including the node's top layer. The deletion is performed in parallel across all layers for optimal performance.
- Graph Repair: For each layer where the node is deleted, the algorithm repairs the local graph
structure by identifying the deleted node's neighbors and reconnecting them appropriately. This process:
- Finds candidate replacement connections among the neighbors of neighbors
- Selects optimal new connections using the HNSW distance heuristics
- Updates neighbor lists to maintain graph connectivity and search performance
- Applies connection limits (M, MMax) and prunes excess connections if necessary
- Entry Point Management: If the deleted node was serving as the graph's entry point (the starting node for search operations), the method automatically selects a new entry point from the remaining nodes at the highest available layer. If no nodes remain after deletion, the access information is cleared, effectively resetting the graph to an empty state.
- Parameters:
transaction- theTransactioncontext for all database operations, ensuring atomicity and consistency of the deletion and repair operationsprimaryKey- the uniqueTupleprimary key identifying the node to be deleted from the graph- Returns:
- a
CompletableFuturethat completes when the deletion operation is fully finished, including all graph repairs and entry point updates. The future completes withnullon successful deletion.
-
orderByDistance
@Nonnull public AsyncIterator<ResultEntry> orderByDistance(@Nonnull ReadTransaction readTransaction, int efRingSearch, int efOutwardSearch, boolean includeVectors, @Nonnull RealVector centerVector, double minimumRadius, @Nullable Tuple minimumPrimaryKey) Returns an async iterator that returns results ordered by their distance from a given center vector.This method initiates an outward traversal from the
centerVector, effectively performing a k-NN (k-Nearest Neighbor) or beam search. The results are returned as anAsyncIterator, with items yielded in increasing order of their distance from the center. The search can be started or resumed from a specific point defined byminimumRadiusandminimumPrimaryKey, allowing for pagination.- Parameters:
readTransaction- the transaction to use for reading dataefRingSearch- the exploration factor for the initial ring search phase of the HNSW algorithmefOutwardSearch- the exploration factor for the main outward search phase, determining the size of the candidate queueincludeVectors- a boolean flag indicating whether the full vectors should be reconstructed and included in the results. Iffalse, the vector in eachResultEntrywill benull.centerVector- the vector to search around. Results will be ordered by their distance to this vectorminimumRadius- the minimum distance from thecenterVector. Only results with a distance greater than will be returned.minimumPrimaryKey- the primary key of the last item from a previous scan, used for pagination. If provided along withminimumRadius, the scan will resume after the item with this key at that radius. Can benullto start from the beginning.- Returns:
- an
AsyncIteratorofResultEntryobjects, ordered by increasing distance from thecenterVector
-