Class HNSW
HNSW constructs a multi-layer graph, where each layer is a subset of the one below it. The top layers serve as fast entry points to navigate the graph, while the bottom layer contains all the data points. This structure allows for logarithmic-time complexity for search operations, making it suitable for large-scale, high-dimensional datasets.
This class provides methods for building the graph (insert(Transaction, Tuple, RealVector))
and performing k-NN searches (kNearestNeighborsSearch(ReadTransaction, int, int, boolean, RealVector)).
It is designed to be used with a transactional storage backend, managed via a Subspace.
-
Constructor Summary
ConstructorsConstructorDescriptionHNSW(Subspace subspace, Executor executor, Config config, OnWriteListener onWriteListener, OnReadListener onReadListener) Constructs a new HNSW graph instance. -
Method Summary
Modifier and TypeMethodDescriptionstatic ConfigdefaultConfig(int numDimensions) Returns a defaultConfig.Get this hnsw's configuration.Get the executor used by this hnsw.Get the on-read listener.Get the on-write listener.Gets the subspace associated with this object.insert(Transaction transaction, Tuple newPrimaryKey, RealVector newVector) Inserts a new vector with its associated primary key into the HNSW graph.CompletableFuture<? extends List<? extends ResultEntry>>kNearestNeighborsSearch(ReadTransaction readTransaction, int k, int efSearch, boolean includeVectors, RealVector queryVector) Performs a k-nearest neighbors (k-NN) search for a given query vector.static Config.ConfigBuilderStart building aConfig.
-
Constructor Details
-
HNSW
public HNSW(@Nonnull Subspace subspace, @Nonnull Executor executor, @Nonnull Config config, @Nonnull OnWriteListener onWriteListener, @Nonnull OnReadListener onReadListener) Constructs a new HNSW graph instance.This constructor initializes the HNSW graph with the necessary components for storage, execution, configuration, and event handling. All parameters are mandatory and must not be null.
- Parameters:
subspace- theSubspacewhere the graph data is stored.executor- theExecutorservice to use for concurrent operations.config- theConfigobject containing HNSW algorithm parameters.onWriteListener- a listener to be notified of write events on the graph.onReadListener- a listener to be notified of read events on the graph.- Throws:
NullPointerException- if any of the parameters arenull.
-
-
Method Details
-
newConfigBuilder
Start building aConfig.- Returns:
- a new
Configthat can be altered and then built for use with aHNSW - See Also:
-
defaultConfig
Returns a defaultConfig.- Parameters:
numDimensions- number of dimensions- Returns:
- a new default
Config. - See Also:
-
getSubspace
Gets the subspace associated with this object.- Returns:
- the non-null subspace
-
getExecutor
Get the executor used by this hnsw.- Returns:
- executor used when running asynchronous tasks
-
getConfig
Get this hnsw's configuration.- Returns:
- hnsw configuration
-
getOnWriteListener
Get the on-write listener.- Returns:
- the on-write listener
-
getOnReadListener
Get the on-read listener.- Returns:
- the on-read listener
-
kNearestNeighborsSearch
@Nonnull public CompletableFuture<? extends List<? extends ResultEntry>> kNearestNeighborsSearch(@Nonnull ReadTransaction readTransaction, int k, int efSearch, boolean includeVectors, @Nonnull RealVector queryVector) Performs a k-nearest neighbors (k-NN) search for a given query vector.This method implements the search algorithm for an HNSW graph. The search begins at an entry point in the highest layer and greedily traverses down through the layers. In each layer, it finds the node closest to the
queryVector. This node then serves as the entry point for the search in the layer below.Once the search reaches the base layer (layer 0), it performs a more exhaustive search starting from the determined entry point. It explores the graph, maintaining a dynamic list of the best candidates found so far. The size of this candidate list is controlled by the
efSearchparameter. Finally, the method selects the topknodes from the search results, sorted by their distance to the query vector.- Parameters:
readTransaction- the transaction to use for reading from the databasek- the number of nearest neighbors to returnefSearch- the size of the dynamic candidate list for the search. A larger value increases accuracy at the cost of performance.includeVectors- indicator if the caller would like the search to also include vectors in the result setqueryVector- the vector to find the nearest neighbors for- Returns:
- a
CompletableFuturethat will complete with a list of theknearest neighbors, sorted by distance in ascending order.
-
insert
@Nonnull public CompletableFuture<Void> insert(@Nonnull Transaction transaction, @Nonnull Tuple newPrimaryKey, @Nonnull RealVector newVector) Inserts a new vector with its associated primary key into the HNSW graph.The method first determines a random layer for the new node, called the
insertionLayer. It then traverses the graph from the entry point downwards, greedily searching for the nearest neighbors to thenewVectorat each layer. This search identifies the optimal connection points for the new node.Once the nearest neighbors are found, the new node is linked into the graph structure at all layers up to its
insertionLayer. Special handling is included for inserting the first-ever node into the graph or when a new node's layer is higher than any existing node, which updates the graph's entry point. All operations are performed asynchronously.- Parameters:
transaction- theTransactioncontext for all database operationsnewPrimaryKey- the uniqueTupleprimary key for the new node being insertednewVector- theRealVectordata to be inserted into the graph- Returns:
- a
CompletableFuturethat completes when the insertion operation is finished
-