Class RTree

java.lang.Object
com.apple.foundationdb.async.rtree.RTree

@API(EXPERIMENTAL) public class RTree extends Object
An implementation of an R-tree. See this} link for a general introduction to R-trees.
The main use-case for R-trees, while they are normally used for spatial querying of actual objects in N-dimensional space, is to function as a natural extension of regular B+-tree-based indexes in FDB, but spanning into multiple dimensions. That allows to answer queries using multiple inequalities which is not possible with 1-D indexes.
Here is a short introduction copied from the explanation at the linked wikipedia page. The key idea of the data structure is to group nearby objects and represent them with their minimum bounding rectangle in the next higher level of the tree; the "R" in R-tree is for rectangle. Since all objects lie within this bounding rectangle, a query that does not intersect the bounding rectangle also cannot intersect any of the contained objects. At the leaf level, each rectangle describes a single object; at higher levels the aggregation includes an increasing number of objects. This can also be seen as an increasingly coarse approximation of the data set.
Similar to the B-tree, the R-tree is also a balanced search tree (so all leaf nodes are at the same depth), organizes the data in pages/nodes, and is designed for storage on disk. Each node can contain a maximum number of entries, often denoted as M. It also guarantees a minimum fill (except for the root node).
One of the key properties of an R-tree is that the minimum bounding rectangles (MBR) of the children of a node in the tree may overlap which may cause multiple children to intersect with a query even if that query's mbr is just a single point. An object is only stored in exactly one leaf node of the tree, however, during a search of the tree multiple paths may have to exhaustively followed in order go find all matching objects of that query.
The search performance of the tree is strongly linked to the size of the area being covered by a child (as indicated by a child's mbr) as well as the overlap among children's mbrs at each level of the tree. The key difficulty in constructing a search-efficient tree is to minimize both covered area and the overlap while keeping the tree balanced. Variants of the R-tree such as R+-trees and R*-trees employ different techniques to improve on the basic R-tree ideas and are provably superior with respect to packing of the data structure as well as search performance. These improvements are accomplished by a more complex write path; R+-trees strive to eliminate overlap altogether which becomes more problematic with higher dimensionality while R*-trees attempt to minimize both the covered area by a node and the sibling overlap by approximations as well as re-insertions in order to avoid node-splits. For more information about R+-trees see R+-tree, for more information about R*-trees see R*-tree.
All variants of R-tree mentioned so far have a fatal flaw when considered in context with FDB and specifically the record layer. None of the R-tree variants can return their elements in a stable order that is not sensitive to the physical layout of the tree at query time. That proves to be problematic for queries that are continued at a later time as the physical structure of the tree may have changed due to re-balancing. Thus, it would become necessary to encode all already returned items into the continuation which is simply not feasible.
Another variant (the one we implement here) is a Hilbert R-tree. See Hilbert R-tree for details. In short, the Hilbert R-tree, in addition to being a regular R-tree, also utilizes the Hilbert value (see RTreeHilbertCurveHelpers) of the center of an mbr of an object (or the point itself if the object is a point) to establish an ordering among objects and nodes stored in the tree. All traversals of the tree return objects in Hilbert Value order. The Hilbert value usually is a BigInteger that can be encoded into the continuation of a query thus overcoming the fundamental problems plaguing other variants of the R-trees as mentioned above. In addition to a stable and logical traversal order, the Hilbert value is used to naturally cluster the tree as similar values in Hilbert space map to nearby points in N-dimensional Euclidean space. Lastly, the Hilbert value is also used to avoid eager node-splitting during insertions as well as eager node-fusing during deletions as it defines a natural order between siblings. A node can transfer empty slots from their siblings (for insertions) or children (for deletions). In this way the tree is packed more tightly and costly re-balancing can be avoided while we still do not have to resort to re-insertions of overflowing children.
Clustering based on the Hilbert value has been proven to be superior compared to R-trees, R+-trees, and R*-trees. A disadvantage of a Hilbert R-tree is the definition of the canvas the Hilbert curve is defined over. While there are ways to define a Hilbert curve for floating point coordinates, we cannot support variable length dimensions such as strings. In fact, we only support INT32 and INT64 dimensions.
  • Field Details

    • MAX_CONCURRENT_READS

      public static final int MAX_CONCURRENT_READS
      See Also:
    • DEFAULT_USE_NODE_SLOT_INDEX

      public static final boolean DEFAULT_USE_NODE_SLOT_INDEX
      Indicator if we should maintain a secondary node index consisting of hilbet value and key to speed up update/deletes.
      See Also:
    • DEFAULT_MIN_M

      public static final int DEFAULT_MIN_M
      The minimum number of slots a node has (if not the root node). M should be chosen in a way that the minimum is half of the maximum. That in turn guarantees that overflow/underflow handling can be performed without causing further underflow/overflow.
      See Also:
    • DEFAULT_MAX_M

      public static final int DEFAULT_MAX_M
      The maximum number of slots a node has. This value is derived from DEFAULT_MIN_M.
      See Also:
    • DEFAULT_S

      public static final int DEFAULT_S
      The magic split number. We split S to S + 1 nodes while inserting data and fuse S + 1 to S nodes while deleting data. Academically, 2-to-3 splits and 3-to-2 fuses seem to yield the best results. Please be aware of the following constraints:
      1. When splitting S to S + 1 nodes, we re-distribute the children of S nodes into S + 1 nodes which may cause an underflow if S and M are not set carefully with respect to each other. Example: MIN_M = 25, MAX_M = 32, S = 2, two nodes at already at maximum capacity containing a combined total of 64 children when a new child is inserted. We split the two nodes into three as indicated by S = 2. We have 65 children but there is no way of distributing them among three nodes such that none of them underflows. This constraint can be formulated as S * MAX_M / (S + 1) >= MIN_M.
      2. When fusing S + 1 to S nodes, we re-distribute the children of S + 1 nodes into S + 1 nodes which may cause an overflow if S and M are not set carefully with respect to each other. Example: MIN_M = 25, MAX_M = 32, S = 2, three nodes at already at minimum capacity containing a combined total of 75 children when a child is deleted. We fuse the three nodes into two as indicated by S = 2. We have 75 children but there is no way of distributing them among two nodes such that none of them overflows. This constraint can be formulated as (S + 1) * MIN_M / S <= MAX_M.
      Both constraints are in fact the same constraint and can be written as MAX_M / MIN_M >= (S + 1) / S.
      See Also:
    • DEFAULT_STORAGE

      @Nonnull public static final RTree.Storage DEFAULT_STORAGE
      Default storage layout. Can be either BY_SLOT or BY_NODE. BY_SLOT encodes all information pertaining to a NodeSlot as one key/value pair in the database; BY_NODE encodes all information pertaining to a Node as one key/value pair in the database. While BY_SLOT avoids conflicts as most inserts/updates only need to update one slot, it is by far less compact as some information is stored in a normalized fashion and therefore repeated multiple times (i.e. node identifiers, etc.). BY_NODE inlines slot information into the node leading to a more size-efficient layout of the data. That advantage is offset by a higher likelihood of conflicts.
    • DEFAULT_STORE_HILBERT_VALUES

      public static final boolean DEFAULT_STORE_HILBERT_VALUES
      Indicator if Hilbert values should be stored or not with the data (in leaf nodes). A Hilbert value can always be recomputed from the point.
      See Also:
    • DEFAULT_CONFIG

      @Nonnull public static final RTree.Config DEFAULT_CONFIG
  • Constructor Details

    • RTree

      public RTree(@Nonnull Subspace subspace, @Nonnull Subspace secondarySubspace, @Nonnull Executor executor, @Nonnull Function<RTree.Point,BigInteger> hilbertValueFunction)
      Initialize a new R-tree with the default configuration.
      Parameters:
      subspace - the subspace where the r-tree is stored
      secondarySubspace - the subspace where the node index (if used is stored)
      executor - an executor to use when running asynchronous tasks
      hilbertValueFunction - function to compute the Hilbert value from a RTree.Point
    • RTree

      public RTree(@Nonnull Subspace subspace, @Nonnull Subspace nodeSlotIndexSubspace, @Nonnull Executor executor, @Nonnull RTree.Config config, @Nonnull Function<RTree.Point,BigInteger> hilbertValueFunction, @Nonnull Supplier<byte[]> nodeIdSupplier, @Nonnull OnWriteListener onWriteListener, @Nonnull OnReadListener onReadListener)
      Initialize a new R-tree.
      Parameters:
      subspace - the subspace where the r-tree is stored
      nodeSlotIndexSubspace - the subspace where the node index (if used is stored)
      executor - an executor to use when running asynchronous tasks
      config - configuration to use
      hilbertValueFunction - function to compute the Hilbert value for a RTree.Point
      nodeIdSupplier - supplier to be invoked when new nodes are created
      onWriteListener - an on-write listener to be called after writes take place
      onReadListener - an on-read listener to be called after reads take place
  • Method Details

    • newConfigBuilder

      public static RTree.ConfigBuilder newConfigBuilder()
      Start building a RTree.Config.
      Returns:
      a new Config that can be altered and then built for use with a RTree
      See Also:
    • getExecutor

      @Nonnull public Executor getExecutor()
      Get the executer used by this r-tree.
      Returns:
      executor used when running asynchronous tasks
    • getConfig

      @Nonnull public RTree.Config getConfig()
      Get this r-tree's configuration.
      Returns:
      r-tree configuration
    • getOnWriteListener

      @Nonnull public OnWriteListener getOnWriteListener()
      Get the on-write listener.
      Returns:
      the on-write listener
    • getOnReadListener

      @Nonnull public OnReadListener getOnReadListener()
      Get the on-read listener.
      Returns:
      the on-read listener
    • scan

      @Nonnull public AsyncIterator<ItemSlot> scan(@Nonnull ReadTransaction readTransaction, @Nonnull Predicate<RTree.Rectangle> mbrPredicate, @Nonnull BiPredicate<Tuple,Tuple> suffixKeyPredicate)
      Perform a scan over the tree within the transaction passed in using a predicate that is also passed in to eliminate subtrees from the scan. This predicate may be stateful which allows for dynamic adjustments of the queried area while the scan is active.
      A scan of the tree offers all items that pass the mbrPredicate test in Hilbert Value order using an AsyncIterator. The predicate that is passed in is applied to intermediate nodes as well as leaf nodes, but not to elements contained by a leaf node. The caller should filter out items in a downstream operation. A scan of the tree will not prefetch the next node before the items of the current node have been consumed. This guarantees that the semantics of the mbr predicate can be adapted in response to the items being consumed. (this allows for efficient scans for ORDER BY x, y LIMIT n queries).
      Parameters:
      readTransaction - the transaction to use
      mbrPredicate - a predicate on an mbr RTree.Rectangle
      suffixKeyPredicate - a predicate on the suffix key
      Returns:
      an AsyncIterator of ItemSlots.
    • scan

      @Nonnull public AsyncIterator<ItemSlot> scan(@Nonnull ReadTransaction readTransaction, @Nullable BigInteger lastHilbertValue, @Nullable Tuple lastKey, @Nonnull Predicate<RTree.Rectangle> mbrPredicate, @Nonnull BiPredicate<Tuple,Tuple> suffixKeyPredicate)
      Perform a scan over the tree within the transaction passed in using a predicate that is also passed in to eliminate subtrees from the scan. This predicate may be stateful which allows for dynamic adjustments of the queried area while the scan is active.
      A scan of the tree offers all items that pass the mbrPredicate test in Hilbert Value order using an AsyncIterator. The predicate that is passed in is applied to intermediate nodes as well as leaf nodes, but not to elements contained in a leaf node. The caller should filter out items in a downstream operation. A scan of the tree will not prefetch the next node before the items of the current node have been consumed. This guarantees that the semantics of the mbr predicate can be adapted in response to the items being consumed. (this allows for efficient scans for ORDER BY x, y LIMIT n queries).
      Parameters:
      readTransaction - the transaction to use
      lastHilbertValue - the last Hilbert value that was returned by a previous call to this method
      lastKey - the last key that was returned by a previous call to this method
      mbrPredicate - a predicate on an mbr RTree.Rectangle
      suffixKeyPredicate - a predicate on the suffix key
      Returns:
      an AsyncIterator of ItemSlots.
    • insertOrUpdate

      @Nonnull public CompletableFuture<Void> insertOrUpdate(@Nonnull TransactionContext tc, @Nonnull RTree.Point point, @Nonnull Tuple keySuffix, @Nonnull Tuple value)
      Method to insert an object/item into the R-tree. The item is treated unique per its point in space as well as its additional key that is also passed in. The Hilbert value of the point is passed in as to allow the caller to compute Hilbert values themselves. Note that there is a bijective mapping between point and Hilbert value which allows us to recompute point from Hilbert value as well as Hilbert value from point. We currently treat point and Hilbert value independent, however, they are redundant and not independent at all. The implication is that we do not have to store both point and Hilbert value (but we currently do).
      Parameters:
      tc - transaction context
      point - the point to be used in space
      keySuffix - the additional key to be stored with the item
      value - the additional value to be stored with the item
      Returns:
      a completable future that completes when the insert is completed
    • delete

      @Nonnull public CompletableFuture<Void> delete(@Nonnull TransactionContext tc, @Nonnull RTree.Point point, @Nonnull Tuple keySuffix)
      Method to delete from the R-tree. The item is treated unique per its point in space as well as its additional key that is passed in.
      Parameters:
      tc - transaction context
      point - the point
      keySuffix - the additional key to be stored with the item
      Returns:
      a completable future that completes when the delete operation is completed
    • depth

      public int depth(@Nonnull TransactionContext transactionContext)
      Method to compute the depth of this R-tree.
      Parameters:
      transactionContext - transaction context to be used
      Returns:
      the depth of the R-tree
    • validate

      public void validate(@Nonnull Database db)
      Method to validate the Hilbert R-tree.
      Parameters:
      db - the database to use
    • validate

      public void validate(@Nonnull Database db, int maxNumNodesToBeValidated)
      Method to validate the Hilbert R-tree.
      Parameters:
      db - the database to use
      maxNumNodesToBeValidated - a maximum number of nodes this call should attempt to validate