Class BunchedMap<K,V>
- Type Parameters:
K
- type of keys in the mapV
- type of values in the map
This structure "bunches" adjacent keys together so that one key in the database is responsible for storing multiple entries in the map, which effectively amortizes the cost of this subspace prefix across multiple map entries. In particular, the map will choose "signpost" keys in the map. For each signpost, a key in the database is constructed that is the subspace prefix concatenated with the serialized key. This key is then responsible for storing every entry in the map for which the key is greater than or equal to the signpost key but less than the next signpost key. The signposts are chosen dynamically as keys are added and removed from the map. In particular, there is a target "bunch size" that is a parameter to the map, and upon inserting, the map will see if there is a bunch that the given key can be placed in without exceeding the bunch size. If not, it will create one by adding a new signpost key.
The cost for bunching entries this way is that it requires that the client perform additional database reads while inserting, so mutations have a higher latency than under the simpler scheme, and two clients attempting to modify the same map are likely to experience contention. It is also more expensive to read a single key from the map (as the read now will also read the data for keys in the same bunch as the desired key). A full scan of the map requires less data be transferred over the wire as the subspace prefix can be sent fewer times, so scan-heavy use-cases might not experience much of an overhead at all.
Most methods of this class take a subspace. For the most part, these methods assume that there is one
durable instance of a BunchedMap
within the bounds of the subspace provided. The exception
to this is the
scanMulti(ReadTransaction, Subspace, SubspaceSplitter, byte[], byte[], byte[], int, boolean)
scanMulti()}
family of methods. See the documentation on those methods for more information.
This class is not thread-safe in the general case. Assuming that the serializer and key-comparator are both thread-safe, this class is safe to use from multiple transactions at once or with multiple subspaces concurrently within a single transaction. However, it is unsafe to modify two keys within the same subspace in the same transaction from multiple threads.
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
BunchedMap
(BunchedMap<K, V> model) Copy constructor forBunchedMap
s.BunchedMap
(BunchedSerializer<K, V> serializer, Comparator<K> keyComparator, int bunchSize) Create a bunched map with the given serializer, key comparator, and bunch size. -
Method Summary
Modifier and TypeMethodDescriptionprotected CompletableFuture<byte[]>
compact
(TransactionContext tcx, Subspace subspace, int keyLimit, byte[] continuation) Compact the values within the map into as few keys as possible.containsKey
(TransactionContext tcx, Subspace subspace, K key) Determines whether a key is contained within the map.get
(TransactionContext tcx, Subspace subspace, K key) Retrieves the value associated with a key from the map.int
Get the maximum number of map entries to encode in a single database key.Get the comparator used to order keys of the map.Get the serializer used to encode keys and values.protected void
instrumentDelete
(byte[] key, byte[] oldValue) Instrument a delete.protected CompletableFuture<List<KeyValue>>
instrumentRangeRead
(CompletableFuture<List<KeyValue>> readFuture) Instrument a range read.protected void
instrumentWrite
(byte[] key, byte[] value, byte[] oldValue) Instrument a write.put
(TransactionContext tcx, Subspace subspace, K key, V value) Inserts or updates a key into a map with a new value.remove
(TransactionContext tcx, Subspace subspace, K key) Removes a key from the map.scan
(ReadTransaction tr, Subspace subspace) Scans the map and returns an iterator over all entries.scan
(ReadTransaction tr, Subspace subspace, byte[] continuation) Scans the maps and returns an iterator over all entries.scan
(ReadTransaction tr, Subspace subspace, byte[] continuation, int limit, boolean reverse) Scans the map and returns an iterator over all entries.<T> BunchedMapMultiIterator<K,
V, T> scanMulti
(ReadTransaction tr, Subspace subspace, SubspaceSplitter<T> splitter) Scans multiple maps and returns an iterator over all of them.<T> BunchedMapMultiIterator<K,
V, T> scanMulti
(ReadTransaction tr, Subspace subspace, SubspaceSplitter<T> splitter, byte[] subspaceStart, byte[] subspaceEnd, byte[] continuation, int limit, boolean reverse) Scans multiple maps and returns an iterator over all of them.<T> BunchedMapMultiIterator<K,
V, T> scanMulti
(ReadTransaction tr, Subspace subspace, SubspaceSplitter<T> splitter, byte[] subspaceStart, byte[] subspaceEnd, byte[] continuation, int limit, Consumer<KeyValue> postReadCallback, boolean reverse) Overload ofscanMulti()
that provides a callback to run after each key is read.<T> BunchedMapMultiIterator<K,
V, T> scanMulti
(ReadTransaction tr, Subspace subspace, SubspaceSplitter<T> splitter, byte[] continuation, int limit, boolean reverse) Scans multiple maps and returns an iterator over all of them.verifyIntegrity
(TransactionContext tcx, Subspace subspace) Verify the integrity of the bunched map.
-
Constructor Details
-
BunchedMap
public BunchedMap(@Nonnull BunchedSerializer<K, V> serializer, @Nonnull Comparator<K> keyComparator, int bunchSize) Create a bunched map with the given serializer, key comparator, and bunch size. The provided serializer is used to serialize keys and values when writing to the database and to deserialize them when reading. The comparator is used to maintain keys in a sorted order. The sorted order of keys, however, should be consistent with the byte order of serialized keys (when using unsigned lexicographic comparison), as that comparison method is used by the map when it is more efficient. The bunch size is the maximum number of map keys within any bunch of keys within the database. This value is not stored durably in the database, and it is safe to change this value over time (and to have different writers using different values for the bunch size concurrently), though one writer might undo the work of another writer or make different decisions when splitting up values or adding to bunches.- Parameters:
serializer
- serialize to use when reading or writing datakeyComparator
- comparator used to order keysbunchSize
- maximum size of bunch within the database
-
BunchedMap
Copy constructor forBunchedMap
s. This will create a newBunchedMap
with the same internal state as the original one. Because theBunchedMap
has entirely immutable state, the primary use case for this is for extending subclasses to ensure that the class gets initialized correctly, which is why this is not apublic
method.- Parameters:
model
- originalBunchedMap
to base the new one on
-
-
Method Details
-
instrumentRangeRead
@Nonnull protected CompletableFuture<List<KeyValue>> instrumentRangeRead(@Nonnull CompletableFuture<List<KeyValue>> readFuture) Instrument a range read. The base implementation only returns the original future, but extenders are encouraged to override this method with their own implementations that, for example, records the total numbers of keys read and their sizes.- Parameters:
readFuture
- a future that will complete to a list of keys and values- Returns:
- an instrumented future that returns the same values as the original future
-
instrumentWrite
protected void instrumentWrite(@Nonnull byte[] key, @Nonnull byte[] value, @Nullable byte[] oldValue) Instrument a write. The default implementation does nothing, but extenders are encouraged to override this method with their own implementations that, for example, counts how many bytes have been added or removed from the database. Implementations should not modify the values in any of the arrays, and they should also be aware that theoldValue
parameter may benull
.- Parameters:
key
- the key being writtenvalue
- the new value being written to the keyoldValue
- the previous value being overwritten ornull
if the key is new or the previous value unknown
-
instrumentDelete
protected void instrumentDelete(@Nonnull byte[] key, @Nullable byte[] oldValue) Instrument a delete. The default implementation does nothing, but extenders are encouraged to override this method with their own implementation that, for example, counts how many bytes have been removed from the database. Implementations should not modify the values in any of the arrays, and they should also be aware that theoldValue
parameter may benull
.- Parameters:
key
- the key being deletedoldValue
- the previous value being delete ornull
if the previous value is unknown
-
put
@Nonnull public CompletableFuture<Optional<V>> put(@Nonnull TransactionContext tcx, @Nonnull Subspace subspace, @Nonnull K key, @Nonnull V value) Inserts or updates a key into a map with a new value. This will find an appropriate bunch to insert the key into (or create one if one doesn't exist or if all of the candidates are full). It will do work to make sure that the placement is locally optimal (that is, it will choose between the one or two bunches closest to the key when performing its bunching). It makes no attempt to fix suboptimal bunches elsewhere within the map. If the map already containskey
, it will overwrite the existing key with the new value. This will return the old value if one is present.Note that this method is not thread-safe if multiple threads call it with the same transaction and subspace. (Multiple calls with different transactions or subspaces are safe.)
Note that this call is asynchronous. It will return a
CompletableFuture
that will be completed when this task has completed.- Parameters:
tcx
- database or transaction to use when performing the insertionsubspace
- subspace within which the map's data are locatedkey
- key of the map entry to insertvalue
- value of the map entry to insert- Returns:
- a future that will complete with an optional that will either contain the previous value associated with the key or be empty if there was not a previous value
-
containsKey
@Nonnull public CompletableFuture<Boolean> containsKey(@Nonnull TransactionContext tcx, @Nonnull Subspace subspace, @Nonnull K key) Determines whether a key is contained within the map. This method is safe to run concurrently with other map operations in other threads. However, if there are concurrentput
orremove
calls, then there are no guarantees as to whether this will returntrue
orfalse
.- Parameters:
tcx
- database or transaction to use when performing readssubspace
- subspace within which the map's data are locatedkey
- the key to check for membership within the map- Returns:
- a future that will be completed to
true
if the map containskey
andfalse
otherwise
-
get
@Nonnull public CompletableFuture<Optional<V>> get(@Nonnull TransactionContext tcx, @Nonnull Subspace subspace, @Nonnull K key) Retrieves the value associated with a key from the map. This method is safe to run concurrently with other map operations. However, if there are concurrentput
orremove
operations, then there are no guarantees as to whether this operation will see the result of the concurrent operation or not.- Parameters:
tcx
- database or transaction to use when performing readssubspace
- subspace within which the map's data are locatedkey
- the key within the map to retrieve the value of- Returns:
- a future that will be completed with an optional that will be present with the value associated with the key in the database or empty if the key is not contained within the map
-
remove
@Nonnull public CompletableFuture<Optional<V>> remove(@Nonnull TransactionContext tcx, @Nonnull Subspace subspace, @Nonnull K key) Removes a key from the map. This returns a future that will contain an optional with the old value associated with the key within the map (prior to deletion) if one is present or will be empty if the key was not contained within the map.Note that this method is not thread-safe if multiple threads call it with the same transaction and subspace. (Multiple calls with different transactions or subspaces are safe.)
Note that this call is asynchronous. It will return a
CompletableFuture
that will be completed when this task has completed.- Parameters:
tcx
- database or transaction to use when removing the keysubspace
- subspace within which the map's data are locatedkey
- the key to remove from the map- Returns:
- a future that will be completed with an optional that will be present with the value associated with the key in the database (prior to removal) or will be empty if the key was not present
-
verifyIntegrity
@Nonnull public CompletableFuture<Void> verifyIntegrity(@Nonnull TransactionContext tcx, @Nonnull Subspace subspace) Verify the integrity of the bunched map. This will read through all of the database keys associated with the map and verify that all of the keys are in order. If it encounters an error, it will complete with an exception. Otherwise, the returned future will complete normally.- Parameters:
tcx
- database or transaction to use when removing the keysubspace
- subspace within which the map's data are located- Returns:
- a future that will complete when the integrity check has finished
-
compact
@Nonnull protected CompletableFuture<byte[]> compact(@Nonnull TransactionContext tcx, @Nonnull Subspace subspace, int keyLimit, @Nullable byte[] continuation) Compact the values within the map into as few keys as possible. This will scan through and re-write the keys to be optimal. This feature is experimental at the moment, but it should be used to better pack entries if needed.- Parameters:
tcx
- database or transaction to use when compacting datasubspace
- subspace within which the map's data are locatedkeyLimit
- maximum number of database keys to read in a single transactioncontinuation
- the continuation returned from a previous call ornull
to start from the beginning of the subspace- Returns:
- future that will complete with a continuation that can be used to complete
the compaction across multiple transactions (
null
if finished)
-
getSerializer
Get the serializer used to encode keys and values.- Returns:
- this map's serializer
- See Also:
-
getKeyComparator
Get the comparator used to order keys of the map.- Returns:
- this map's key comparator
-
getBunchSize
public int getBunchSize()Get the maximum number of map entries to encode in a single database key. This property controls how "bunched" the map is. A higher value will use less space at the cost of additional I/O at update time (and at read time in the case ofget(TransactionContext, Subspace, Object)
operations).- Returns:
- the maximum number of map entries to encode in a single database key
-
scan
@Nonnull public BunchedMapIterator<K,V> scan(@Nonnull ReadTransaction tr, @Nonnull Subspace subspace) Scans the map and returns an iterator over all entries. This has the same behavior as thescan()
method that takes more parameters, but it will return an iterator that has no limit and always returns entries in ascending order by key.- Parameters:
tr
- transaction to use when scanning the mapsubspace
- subspace in which the map's data are located- Returns:
- an iterator over the entries in the map
-
scan
@Nonnull public BunchedMapIterator<K,V> scan(@Nonnull ReadTransaction tr, @Nonnull Subspace subspace, @Nullable byte[] continuation) Scans the maps and returns an iterator over all entries. This has the same behavior as thescan()
method that takes more parameters, but it will return an iterator that has no limit and always returns entries in ascending order by key.- Parameters:
tr
- transaction to use when scanning the mapsubspace
- subspace in which the map's data are locatedcontinuation
- continuation from a previous scan (ornull
to start from the beginning)- Returns:
- an iterator over the entries in the map
-
scan
@Nonnull public BunchedMapIterator<K,V> scan(@Nonnull ReadTransaction tr, @Nonnull Subspace subspace, @Nullable byte[] continuation, int limit, boolean reverse) Scans the map and returns an iterator over all entries. All entries will be returned sorted by key. Ifreverse
istrue
, it will return keys in descending order instead of in ascending order. It will return at mostlimit
keys from the map. Note that because of bunching, this will probably require reading fewer keys from the database. Scans that require multiple transactions can provide acontinuation
from a previous scan. The returned iterator has agetContinuation()
method that can be used to get an appropriate value for that parameter.- Parameters:
tr
- transaction to use when scanning the mapsubspace
- subspace in which the map's data are locatedcontinuation
- continuation from a previous scan (ornull
to start from the beginning)limit
- maximum number of keys to return orReadTransaction.ROW_LIMIT_UNLIMITED
if no limitreverse
-true
if keys are wanted in descending instead of ascending order- Returns:
- an iterator over the entries in the map
-
scanMulti
@Nonnull public <T> BunchedMapMultiIterator<K,V, scanMultiT> (@Nonnull ReadTransaction tr, @Nonnull Subspace subspace, @Nonnull SubspaceSplitter<T> splitter) Scans multiple maps and returns an iterator over all of them. This behaves the same was as thescanMulti()
method that takes additionalsubspaceStart
andsubspaceEnd
parameters, but this will always scan from the beginning ofsubspace
to the end, assumes that there is no limit to the number of entries to return, and always returns items in ascending order.- Type Parameters:
T
- type of the tag of each map subspace- Parameters:
tr
- transaction to use when scanning the mapssubspace
- subspace containing one or more mapssplitter
- object to determine which map a given key is in- Returns:
- an iterator over the entries in multiple maps
-
scanMulti
@Nonnull public <T> BunchedMapMultiIterator<K,V, scanMultiT> (@Nonnull ReadTransaction tr, @Nonnull Subspace subspace, @Nonnull SubspaceSplitter<T> splitter, @Nullable byte[] continuation, int limit, boolean reverse) Scans multiple maps and returns an iterator over all of them. This behaves the same was as thescanMulti()
method that takes additionalsubspaceStart
andsubspaceEnd
parameters, but this will always scan from the beginning ofsubspace
to the end.- Type Parameters:
T
- type of the tag of each map subspace- Parameters:
tr
- transaction to use when scanning the mapssubspace
- subspace containing one or more mapssplitter
- object to determine which map a given key is incontinuation
- continuation from previous scan (ornull
to start from the beginning)limit
- maximum number of entries to returnreverse
-true
if the entries should be returned in descending order orfalse
otherwise- Returns:
- an iterator over the entries in multiple maps
-
scanMulti
@Nonnull public <T> BunchedMapMultiIterator<K,V, scanMultiT> (@Nonnull ReadTransaction tr, @Nonnull Subspace subspace, @Nonnull SubspaceSplitter<T> splitter, @Nullable byte[] subspaceStart, @Nullable byte[] subspaceEnd, @Nullable byte[] continuation, int limit, boolean reverse) Scans multiple maps and returns an iterator over all of them. The returned iterator will produce one item for each entry in each map. To do so, the maps must be backed by contiguous subspaces withinsubspace
. The scan will start at the first subspace greater than or equal to the subspace formed by concatenating the raw prefix ofsubspace
withsubspaceStart
(or with nothing ifsubspaceStart
isnull
) and will end with the last subspace less than the subspace formed by concatenating the raw prefix ofsubspace
withsubspaceEnd
(or with nothing ifsubspaceEnd
isnull
). The providedSubspaceSplitter
should be able to determine the subspace of the map that a given key contains and (optionally) provide some tag that can be used to associate that map with some value that might be used to distinguish one map from the other (if the application needs it).For example, suppose there are ten
BunchedMaps
that all begin with a raw prefixprefix
followed by aTuple
-encoded integer (0 through 10). If one wanted to scan the three maps starting with map 3 and ending with map 6, one would supply the following parameters:subspace
should be set tonew Subspace(prefix)
-
splitter
should be implemented so that if givenkey
within mapn
,subspaceOf(key)
returnssubspace.subspace(Tuple.from(n))
andsubspaceTag(subspaceOf(key))
returnsn
. subspaceStart
should be set toTuple.pack(3)
subspaceEnd
should be set toTuple.pack(7)
(orTuple.pack(6)
concatenated with a 0-byte)
Furthermore, this method can be used across multiple calls or transactions by setting the
continuation
parameter to the result ofgetContinuation()
from a previous scan. (One should passnull
to restart the scan from the beginning.) The iterator will return at mostlimit
entries. The entries will ordered first by subspace and within a subspace by key. Ifreverse
istrue
, the items will be returned in descending order. Otherwise, they will be returned in ascending order.- Type Parameters:
T
- type of the tag of each map subspace- Parameters:
tr
- transaction to use when scanning the mapssubspace
- subspace containing one or more mapssplitter
- object to determine which map a given key is insubspaceStart
- inclusive starting suffix ofsubspace
to start the scansubspaceEnd
- exclusive ending suffix ofsubspace
to end the scancontinuation
- continuation from previous scan (ornull
to start from the beginning)limit
- maximum number of entries to returnreverse
-true
if the entries should be returned in descending order orfalse
otherwise- Returns:
- an iterator over the entries in multiple maps
-
scanMulti
@Nonnull public <T> BunchedMapMultiIterator<K,V, scanMultiT> (@Nonnull ReadTransaction tr, @Nonnull Subspace subspace, @Nonnull SubspaceSplitter<T> splitter, @Nullable byte[] subspaceStart, @Nullable byte[] subspaceEnd, @Nullable byte[] continuation, int limit, @Nullable Consumer<KeyValue> postReadCallback, boolean reverse) Overload ofscanMulti()
that provides a callback to run after each key is read. This hook can be used to interface with metrics collection.- Type Parameters:
T
- type of the tag of each map subspace- Parameters:
tr
- transaction to use when scanning the mapssubspace
- subspace containing one or more mapssplitter
- object to determine which map a given key is insubspaceStart
- inclusive starting suffix ofsubspace
to start the scansubspaceEnd
- exclusive ending suffix ofsubspace
to end the scancontinuation
- continuation from previous scan (ornull
to start from the beginning)postReadCallback
- callback to execute after reading each keylimit
- maximum number of entries to returnreverse
-true
if the entries should be returned in descending order orfalse
otherwise- Returns:
- an iterator over the entries in multiple maps
- See Also:
-