Understanding How Bitmaps Identify Distinct Values

A bitmap is effectively a byte array in which each bit (zero-based) represents existence (bit = 1) or absence (bit = 0) of an integer in a table. For example,

SELECT bitmap_construct_agg(val) AS bitmap FROM bitmap_test_values;

Returns a byte array bitmap where if bitmap[i][j] = 1, it means that value = 8 * i + j exists in the table.

The data can be sparse and very big. To be more efficient, we group the data into “buckets”. A distinct value is represented by the combination of a “bucket number” and a bit that is set in that bitmap. To identify the bucket and the bit that represents a specific value, use the following functions:

In the FDB Relational Layer, the fixed bitmap bucket size is 10,000 bits (1250 bytes). For example:

value

BITMAP_BUCKET_NUMBER

BITMAP_BIT_POSITION

1

0

1

9999

0

9999

10000

10000

0

If we create and populate the table bitmap_test_values:

CREATE TABLE bitmap_test_values (val INT);
insert into bitmap_test_values values (1), (20003);

Then the following query:

select bitmap_bucket_number(val) as offset,
    bitmap_construct_agg(bitmap_bit_position(val)) as bitmap
    from bitmap_test_values
    group by offset;

Returns:

OFFSET

BITMAP

0

[b'00000010, 0, ...0]

2

[b'00001000, 0, ...0]