Class EncodedRealVector
- All Implemented Interfaces:
RealVector
Each component is encoded into numExBits + 1 bits (one sign bit plus
numExBits magnitude "extra bits", following the RaBitQ paper's terminology), tightly
bit-packed for storage. Three per-vector calibration constants — fAddEx,
fRescaleEx, fErrorEx — accompany the integer codes; they are produced by
RaBitQuantizer during encoding and consumed by RaBitDistanceEstimator during
distance evaluation. The encoded form is dramatically smaller than the equivalent
DoubleRealVector (roughly (numExBits + 1) / 64 of the bytes) while still
supporting fast distance estimation against other RaBitQ-encoded queries.
This class implements RealVector so it composes with the rest of the linear-algebra
surface, but it is fundamentally an opaque encoded blob, not a dense numeric vector. Two
consequences of that worth knowing:
getData()lazily reconstructs an approximate densedouble[]from the codes plus calibration constants (seecomputeData()). This is a best-effort dequantization, not the original vector — round-tripping through encoded form is lossy by design.withData(double[])returns a freshDoubleRealVector, not anotherEncodedRealVector, because re-encoding requires the quantizer's full per-call state (rotation seed, calibration sweep). As a result, the arithmetic methods inherited fromRealVector(add,subtract,multiply,normalize) produce ordinary double vectors and discard the encoding.
The wire format produced by getRawData() is a leading VectorType.RABITQ
type byte, the three calibration doubles in big-endian order, then the bit-packed integer
codes. fromBytes(byte[], int, int) parses the format back, given the dimensionality and
numExBits (which are not stored in the byte array — the caller is expected to know
them from surrounding context, typically the quantizer's configuration).
-
Field Summary
Fields inherited from interface com.apple.foundationdb.linear.RealVector
EPS, VECTOR_TYPES -
Constructor Summary
ConstructorsConstructorDescriptionEncodedRealVector(int numExBits, int[] encoded, double fAddEx, double fRescaleEx, double fErrorEx) Constructs an encoded vector from the raw outputs ofRaBitQuantizer. -
Method Summary
Modifier and TypeMethodDescriptiondouble[]Reconstructs an approximate dense representation of the original (rotated, residual-form) vector from the integer codes plus the three calibration constants.intComputes the hash from scratch, backing the memoizinghashCodeSupplier.protected byte[]Serializes this encoded vector into a byte array in the RaBitQ wire format.final booleanTwo encoded vectors compare equal iff they have identical code arrays and identical calibration constants (fAddEx,fRescaleEx,fErrorEx).static EncodedRealVectorfromBytes(byte[] vectorBytes, int numDimensions, int numExBits) Deserializes an encoded vector from the wire format produced bycomputeRawData().doublegetAddEx()Returns the per-vector additive term used during distance estimation.doublegetComponent(int dimension) Gets the component of this object at the specified dimension.double[]getData()Returns the lazily reconstructed dense form of this encoded vector.intgetEncodedComponent(int dimension) Returns the raw integer code for the component at the given dimension.int[]Returns the underlying integer-code array (no copy).doubleReturns the per-vector error bound used during distance estimation and dequantization.intReturns the number of elements in the vector, i.e.byte[]Returns the memoized wire-format serialization of this vector (seecomputeRawData()for the format).doubleReturns the per-vector multiplicative rescale used during distance estimation.inthashCode()Returns the memoized hash code, computed from the code array and the three calibration doubles — consistent withequals(Object).doubleReturns the squared L2 normΣ_i this[i]^2.Converts this vector into aDoubleRealVector.Converts this object into aRealVectorof single precision floating-point numbers.Converts this object into aRealVectorofHalfprecision floating-point numbers.Returnsthis— instances of this class are already immutable.withData(double[] data) Returns a new vector of the same precision and length as the receiver but with the given component data.Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface com.apple.foundationdb.linear.RealVector
add, add, dot, isNearlyZeroNorm, l2Norm, l2SquaredDistance, multiply, normalize, subtract, subtract, toMutable
-
Constructor Details
-
EncodedRealVector
public EncodedRealVector(int numExBits, @Nonnull int[] encoded, double fAddEx, double fRescaleEx, double fErrorEx) Constructs an encoded vector from the raw outputs ofRaBitQuantizer. Theencodedarray is stored by reference (no defensive copy); callers must not mutate it after handing it over.- Parameters:
numExBits- number of magnitude bits per component (not counting the sign bit)encoded- per-dimension integer codes; ownership transfers to this vectorfAddEx- the additive term (getAddEx())fRescaleEx- the multiplicative rescale (getRescaleEx())fErrorEx- the per-vector error bound (getErrorEx())
-
-
Method Details
-
getEncodedData
@Nonnull public int[] getEncodedData()Returns the underlying integer-code array (no copy). Each entry is the signed code for one component, packed intonumExBits + 1bits when serialized. Callers must not mutate the returned array.- Returns:
- the per-dimension code array
-
getAddEx
public double getAddEx()Returns the per-vector additive term used during distance estimation. SeefAddEx. -
getRescaleEx
public double getRescaleEx()Returns the per-vector multiplicative rescale used during distance estimation. SeefRescaleEx. -
getErrorEx
public double getErrorEx()Returns the per-vector error bound used during distance estimation and dequantization. SeefErrorEx. -
equals
Two encoded vectors compare equal iff they have identical code arrays and identical calibration constants (fAddEx,fRescaleEx,fErrorEx). The dequantized representation is intentionally not consulted — equality is on the encoding, not on the (lossy) reconstruction. -
hashCode
public int hashCode()Returns the memoized hash code, computed from the code array and the three calibration doubles — consistent withequals(Object). -
computeHashCode
public int computeHashCode()Computes the hash from scratch, backing the memoizinghashCodeSupplier.- Returns:
- the hash code
-
getNumDimensions
public int getNumDimensions()Description copied from interface:RealVectorReturns the number of elements in the vector, i.e. the number of dimensions.- Specified by:
getNumDimensionsin interfaceRealVector- Returns:
- the number of dimensions
-
getEncodedComponent
public int getEncodedComponent(int dimension) Returns the raw integer code for the component at the given dimension. UnlikegetComponent(int)this does not trigger dequantization — useful for distance estimators that operate directly on the encoded representation.- Parameters:
dimension- the zero-based dimension index- Returns:
- the integer code for that dimension
- Throws:
IndexOutOfBoundsException- ifdimensionis out of range
-
getComponent
public double getComponent(int dimension) Gets the component of this object at the specified dimension.The dimension is a zero-based index. For a 3D vector, for example, dimension 0 might correspond to the x-component, 1 to the y-component, and 2 to the z-component. This method provides direct access to the underlying data element.
Reads from the lazily reconstructed dense form (see
getData()); the first call materializes the fulldouble[]viacomputeData(). If you only need the raw integer code, prefergetEncodedComponent(int)which skips the reconstruction.- Specified by:
getComponentin interfaceRealVector- Parameters:
dimension- the zero-based index of the component to retrieve.- Returns:
- the component at the specified dimension, which is guaranteed to be non-null.
-
getData
@Nonnull public double[] getData()Returns the lazily reconstructed dense form of this encoded vector. The reconstruction is approximate (seecomputeData()for details) and memoized so repeated calls are cheap.- Specified by:
getDatain interfaceRealVector- Returns:
- the data array of type
R[], nevernull.
-
withData
Returns a new vector of the same precision and length as the receiver but with the given component data. Implementations decide whether the returned vector aliasesdata(immutable subtypes typically do; mutable subtypes copy through their existing storage).Returns a fresh
DoubleRealVectorcarryingdata— not a re-encodedEncodedRealVector, because re-encoding requires the quantizer's per-call state (rotation seed, calibration sweep). This means the inherited arithmetic methods (add,subtract,multiply,normalize) all drop back to ordinary double-precision results.- Specified by:
withDatain interfaceRealVector- Parameters:
data- the components for the new vector; length must match this vector's dimensionality- Returns:
- a non-null vector with the given data
-
computeData
@Nonnull public double[] computeData()Reconstructs an approximate dense representation of the original (rotated, residual-form) vector from the integer codes plus the three calibration constants. Backs the memoizing supplier behindgetData().The math, in summary:
- Un-shift the codes by
cB = (1 << numExBits) - 0.5to recover a symmetric-range vectorz. - Estimate the per-vector confidence weight
ρfrom the ratio offErrorExto(ε₀-scaled) ||z|| * fRescaleEx, clamped to[0, 1]. - Return
zscaled by-0.5 * fRescaleEx * ρ.
- Returns:
- the reconstructed dense components
- Throws:
com.google.common.base.VerifyException- if the denominator that scales the error estimate is zero (a degenerate parameter combination that should never occur for a well-formed encoding)
- Un-shift the codes by
-
getRawData
@Nonnull public byte[] getRawData()Returns the memoized wire-format serialization of this vector (seecomputeRawData()for the format).- Specified by:
getRawDatain interfaceRealVector- Returns:
- a non-null byte array containing the raw data.
-
computeRawData
@Nonnull protected byte[] computeRawData()Serializes this encoded vector into a byte array in the RaBitQ wire format. Layout (all multi-byte values big-endian):- 1 byte —
VectorType.RABITQordinal as a type tag. - 8 bytes —
fAddEx. - 8 bytes —
fRescaleEx. - 8 bytes —
fErrorEx. ceil(numDimensions * (numExBits + 1) / 8)bytes — the per-dimension integer codes, tightly bit-packed bypackEncodedComponents(int, ByteBuffer).
numExBitsare not stored in the byte stream;fromBytes(byte[], int, int)requires both as explicit arguments.- Returns:
- the serialized form; never
null
- 1 byte —
-
toHalfRealVector
Converts this object into aRealVectorofHalfprecision floating-point numbers.As this is an abstract method, implementing classes are responsible for defining the specific conversion logic from their internal representation to a
RealVectorusingHalfobjects to serialize and deserialize the vector. If this object already is aHalfRealVectorthis method should returnthis.Computed from the lazily reconstructed dense form (see
getData()); the resultingHalfRealVectoris memoized.- Specified by:
toHalfRealVectorin interfaceRealVector- Returns:
- a non-null
HalfRealVectorcontaining theHalfprecision floating-point representation of this object.
-
toFloatRealVector
Converts this object into aRealVectorof single precision floating-point numbers.As this is an abstract method, implementing classes are responsible for defining the specific conversion logic from their internal representation to a
RealVectorusing floating point numbers to serialize and deserialize the vector. If this object already is aFloatRealVectorthis method should returnthis.Computed from the lazily reconstructed dense form (see
getData()); the resultingFloatRealVectoris memoized.- Specified by:
toFloatRealVectorin interfaceRealVector- Returns:
- a non-null
FloatRealVectorcontaining the single precision floating-point representation of this object.
-
toDoubleRealVector
Converts this vector into aDoubleRealVector.This method provides a way to obtain a double-precision floating-point representation of the vector. If the vector is already an instance of
DoubleRealVector, this method may return the instance itself. Otherwise, it will create a newDoubleRealVectorcontaining the same elements, which may involve a conversion of the underlying data type.Returns a fresh
DoubleRealVectorcarrying the reconstructed dense components. Unlike the half/float conversions, this one is not memoized — the underlyingdouble[]reconstruction is already memoized bygetData(), so wrapping it again is cheap.- Specified by:
toDoubleRealVectorin interfaceRealVector- Returns:
- a non-null
DoubleRealVectorrepresentation of this vector.
-
toImmutable
Returnsthis— instances of this class are already immutable.- Specified by:
toImmutablein interfaceRealVector- Returns:
- a non-null immutable vector with the same components as this vector
-
l2SquaredNorm
public double l2SquaredNorm()Returns the squared L2 normΣ_i this[i]^2. Implementations typically memoize this since the value is reused byRealVector.l2Norm()and several distance helpers.Computed from the reconstructed dense form (it's
dot(this)on the dequantized components) and memoized.- Specified by:
l2SquaredNormin interfaceRealVector- Returns:
- the squared L2 norm of this vector
-
fromBytes
@Nonnull public static EncodedRealVector fromBytes(@Nonnull byte[] vectorBytes, int numDimensions, int numExBits) Deserializes an encoded vector from the wire format produced bycomputeRawData(). The dimensionality andnumExBitsare not stored in the byte stream and must be supplied by the caller (typically from the encoder's configuration).- Parameters:
vectorBytes- the serialized form; must start with theVectorType.RABITQtype tagnumDimensions- number of components in the encoded vectornumExBits- number of magnitude bits per component (sign bit implicit)- Returns:
- a freshly allocated encoded vector
- Throws:
com.google.common.base.VerifyException- if the leading type tag is notVectorType.RABITQ
-