CREATE INDEX

Clause in a schema template definition to create an index. The Record Layer supports two different syntaxes for creating indexes:

  1. INDEX AS SELECT - A query-based syntax inspired by materialized views

  2. INDEX ON - A traditional syntax that creates indexes on views or tables

Both syntaxes create indexes that are maintained incrementally. For any given record insert, update, or delete, the system constructs the difference that needs to be applied to each index in order to update it without needing to completely rebuild it. This means there are limitations to what kinds of indexes can be created. See Defining indexes for more details.

Syntax

CREATE UNIQUE VECTOR INDEX indexName AS query USING HNSW ON source ( keyColumn , ) INCLUDE ( valueColumn , ) PARTITION BY ( partitionColumn , ) OPTIONS ( optionName , optionName = optionValue , )

INDEX AS SELECT Syntax

The INDEX AS SELECT syntax uses a query to define the index structure. This syntax is inspired by materialized views and allows you to define indexes using familiar SQL SELECT queries with GROUP BY for aggregate indexes.

Basic Form

CREATE INDEX indexName AS query

The query must be a SELECT statement that returns the columns to be indexed. The index structure is derived from the query’s SELECT list and ORDER BY clause.

Examples

Simple Value Index

CREATE INDEX idx_price AS
    SELECT price
    FROM products
    ORDER BY price

Covering Value Index

CREATE INDEX idx_category_price AS
    SELECT category, price, name
    FROM products
    ORDER BY category, price

This creates an index with category and price in the key, and name as a covered column.

Aggregate Index (SUM)

CREATE INDEX idx_sales_by_category AS
    SELECT category, SUM(amount)
    FROM sales
    GROUP BY category

Aggregate Index (COUNT)

COUNT(*) counts all records in each group:

CREATE INDEX idx_count_by_region AS
    SELECT region, COUNT(*)
    FROM sales
    GROUP BY region

COUNT(column) counts non-NULL values:

CREATE INDEX idx_count_non_null AS
    SELECT category, COUNT(quantity)
    FROM sales
    GROUP BY category

Aggregate Index (MIN/MAX)

MIN and MAX support permuted ordering, where the aggregate value can appear at different positions in the key:

CREATE INDEX idx_max_by_category AS
    SELECT category, MAX(amount)
    FROM sales
    GROUP BY category
    ORDER BY category, MAX(amount)

The aggregate can also appear in the middle of the key:

CREATE INDEX idx_cat_max_region AS
    SELECT category, MAX(amount), region
    FROM sales
    GROUP BY category, region
    ORDER BY category, MAX(amount), region

Aggregate Index (MIN_EVER/MAX_EVER)

MIN_EVER and MAX_EVER track the minimum/maximum value ever seen, even after deletions. These are useful for maintaining historical extrema:

CREATE INDEX idx_min_ever_by_category AS
    SELECT category, MIN_EVER(price)
    FROM products
    GROUP BY category

CREATE INDEX idx_max_ever_by_region AS
    SELECT region, MAX_EVER(sales_amount)
    FROM transactions
    GROUP BY region

For legacy compatibility with older versions, you can use the LEGACY_EXTREMUM_EVER attribute:

CREATE INDEX idx_min_ever_legacy AS
    SELECT category, MIN_EVER(price)
    FROM products
    GROUP BY category
    WITH ATTRIBUTES LEGACY_EXTREMUM_EVER

Bitmap Aggregate Index

Bitmap indexes use specialized functions for efficient set operations and are particularly useful for filtering on high-cardinality columns:

CREATE INDEX idx_bitmap_by_category AS
    SELECT bitmap_construct_agg(bitmap_bit_position(id)) AS bitmap,
           category,
           bitmap_bucket_offset(id) AS offset
    FROM products
    GROUP BY category, bitmap_bucket_offset(id)

For ungrouped bitmap indexes:

CREATE INDEX idx_bitmap_all AS
    SELECT bitmap_construct_agg(bitmap_bit_position(id)) AS bitmap,
           bitmap_bucket_offset(id) AS offset
    FROM products
    GROUP BY bitmap_bucket_offset(id)

Aggregate with Expressions in GROUP BY

You can use expressions in GROUP BY clauses:

CREATE INDEX idx_sum_by_expression AS
    SELECT amount + 100, MAX(quantity)
    FROM sales
    GROUP BY amount + 100

CREATE INDEX idx_multi_expression AS
    SELECT category_id + region_id, status + 10, MIN(price)
    FROM products
    GROUP BY category_id + region_id, status + 10

Filtered Index

CREATE INDEX idx_expensive_products AS
    SELECT name, price
    FROM products
    WHERE price > 100
    ORDER BY price

Multiple Grouping Columns

CREATE INDEX idx_sales_by_category_region AS
    SELECT category, region, SUM(amount)
    FROM sales
    GROUP BY category, region

Descending Order

CREATE INDEX idx_price_desc AS
    SELECT price
    FROM products
    ORDER BY price DESC

Mixed Ordering

CREATE INDEX idx_category_desc_price_asc AS
    SELECT category, price
    FROM products
    ORDER BY category DESC, price ASC

NULL Ordering

CREATE INDEX idx_rating_nulls_last AS
    SELECT rating
    FROM products
    ORDER BY rating ASC NULLS LAST

CREATE INDEX idx_supplier_desc_nulls_first AS
    SELECT supplier
    FROM products
    ORDER BY supplier DESC NULLS FIRST

Aggregate Index Capabilities

Aggregate indexes support the following aggregate functions:

Supported Aggregate Functions

  • SUM(column) - Sum of values

  • COUNT(*) - Count of all rows

  • COUNT(column) - Count of non-NULL values

  • MIN(column) - Minimum value (supports permuted ordering)

  • MAX(column) - Maximum value (supports permuted ordering)

  • MIN_EVER(column) - Historical minimum (persists across deletions)

  • MAX_EVER(column) - Historical maximum (persists across deletions)

  • BITMAP_CONSTRUCT_AGG(bitmap_bit_position(column)) - Bitmap construction for set operations

Permuted Ordering

MIN and MAX indexes support permuted ordering, meaning the aggregate value can appear at any position in the ORDER BY clause:

-- Aggregate at the end (standard)
CREATE INDEX idx1 AS
    SELECT category, MAX(amount)
    FROM sales
    GROUP BY category
    ORDER BY category, MAX(amount)

-- Aggregate in the middle
CREATE INDEX idx2 AS
    SELECT category, MAX(amount), region
    FROM sales
    GROUP BY category, region
    ORDER BY category, MAX(amount), region

-- Aggregate at the beginning
CREATE INDEX idx3 AS
    SELECT MIN(amount), category, region
    FROM sales
    GROUP BY category, region
    ORDER BY MIN(amount), category, region

Aggregate Index Limitations

  • Only one aggregate function per index

  • Aggregate columns must be integer types (bigint) for most functions (SUM, COUNT, MIN, MAX)

  • MIN_EVER and MAX_EVER work with strings and other comparable types

  • Expressions in GROUP BY are supported

  • WHERE clauses can be used with aggregate indexes for filtered aggregates

INDEX ON Syntax

The INDEX ON syntax creates an index on an existing view or table using a traditional columnar specification. This syntax is particularly useful when combined with views that define filtering or aggregation logic.

Basic Form

CREATE INDEX indexName ON source(columns) [INCLUDE(valueColumns)] [OPTIONS(...)]

Where: - source is a table or view name - columns specifies the index key columns (with optional ordering) - INCLUDE clause adds covered columns stored as values - OPTIONS clause specifies index-specific options

Examples

Simple Value Index

CREATE INDEX idx_price ON products(price)

Multi-Column Index

CREATE INDEX idx_category_price ON products(category, price)

Covering Index with INCLUDE

CREATE INDEX idx_category_price_covering ON products(category, price)
    INCLUDE(name, stock)

This creates an index with category and price in the key, and name and stock as covered values.

Aggregate Index (SUM)

First define a view with the aggregation:

CREATE VIEW v_sales_by_category AS
    SELECT category, SUM(amount) AS total_amount
    FROM sales
    GROUP BY category

CREATE INDEX idx_sales_by_category ON v_sales_by_category(category)
    INCLUDE(total_amount)

Aggregate Index (COUNT)

CREATE VIEW v_count_by_region AS
    SELECT region, COUNT(*) AS record_count
    FROM sales
    GROUP BY region

CREATE INDEX idx_count_by_region ON v_count_by_region(region)
    INCLUDE(record_count)

Filtered Index

First define a view with the filter:

CREATE VIEW v_expensive_products AS
    SELECT name, price
    FROM products
    WHERE price > 100

CREATE INDEX idx_expensive_products ON v_expensive_products(price)

Custom Ordering

CREATE INDEX idx_category_desc ON products(category DESC, price ASC)

NULL Ordering

CREATE INDEX idx_rating_nulls_last ON products(rating ASC NULLS LAST)

Column Ordering and NULL Handling

When creating an index using the INDEX ON syntax, each key column can specify sorting criteria and null semantics to control how values are ordered in the index.

Sorting Criteria

Each key column supports the following sort orders:

  • ASC (ascending) - Values are sorted from smallest to largest (default if not specified)

  • DESC (descending) - Values are sorted from largest to smallest

NULL Semantics

You can control where NULL values appear in the sort order:

  • NULLS FIRST - NULL values appear before non-NULL values

  • NULLS LAST - NULL values appear after non-NULL values

Default NULL behavior: - For ASC ordering: NULLS FIRST is the default - For DESC ordering: NULLS LAST is the default

Syntax Options

The ordering clause for each column can take the following forms:

  1. Sort order only: columnName ASC or columnName DESC

  2. Sort order with null semantics: columnName ASC NULLS LAST or columnName DESC NULLS FIRST

  3. Null semantics only: columnName NULLS FIRST or columnName NULLS LAST (uses default ASC ordering)

Examples

-- Ascending order with nulls last
CREATE INDEX idx_price ON products(price ASC NULLS LAST)

-- Descending order with nulls first
CREATE INDEX idx_rating ON products(rating DESC NULLS FIRST)

-- Specify only null semantics (ascending is implicit)
CREATE INDEX idx_stock ON products(stock NULLS LAST)

-- Mixed ordering across multiple columns
CREATE INDEX idx_complex ON products(
    category ASC NULLS FIRST,
    price DESC NULLS LAST,
    name ASC
)

VECTOR INDEX Syntax

The VECTOR INDEX syntax creates an index specifically designed for vector similarity search using the HNSW (Hierarchical Navigable Small World) algorithm. Vector indexes enable efficient approximate nearest neighbor (ANN) search on high-dimensional vector data.

Basic Form

CREATE VECTOR INDEX indexName USING HNSW ON source(vectorColumn)
    [PARTITION BY(partitionColumns)]
    [OPTIONS(...)]

Where: - source is a table or view name - vectorColumn is a column of type vector(dimensions, float) - PARTITION BY clause specifies partitioning columns (optional but recommended) - OPTIONS clause specifies HNSW-specific configuration options

Note

The PARTITION BY clause is only applicable to vector indexes. It is not supported for regular value indexes created with the INDEX ON syntax. Partitioning helps organize vectors by category or tenant, improving query performance for vector similarity searches pertaining specific category or tenant.

Examples

Simple Vector Index

CREATE TABLE products(
    id bigint,
    name string,
    embedding vector(128, float),
    primary key(id)
)

CREATE VECTOR INDEX idx_embedding USING HNSW ON products(embedding)

Vector Index with Partitioning

Partitioning is recommended for better performance and to organize vectors by category or tenant:

CREATE TABLE products(
    id bigint,
    category string,
    embedding vector(256, float),
    primary key(id)
)

CREATE VECTOR INDEX idx_embedding USING HNSW ON products(embedding)
    PARTITION BY(category)

Vector Index with Custom Options

CREATE VECTOR INDEX idx_embedding USING HNSW ON products(embedding)
    PARTITION BY(category)
    OPTIONS (
        CONNECTIVITY = 16,
        M_MAX = 32,
        EF_CONSTRUCTION = 200,
        METRIC = COSINE_METRIC
    )

Vector Index on Filtered View

CREATE VIEW v_active_products AS
    SELECT id, embedding, category
    FROM products
    WHERE status = 'active'

CREATE VECTOR INDEX idx_active_embeddings USING HNSW ON v_active_products(embedding)
    PARTITION BY(category)

Vector Index with RabitQ Quantization

RabitQ is a quantization technique that reduces memory usage for high-dimensional vectors:

CREATE VECTOR INDEX idx_embedding USING HNSW ON products(embedding)
    PARTITION BY(category)
    OPTIONS (
        USE_RABITQ = true,
        RABITQ_NUM_EX_BITS = 4,
        MAINTAIN_STATS_PROBABILITY = 0.01
    )

Vector Index with Statistics Sampling

CREATE VECTOR INDEX idx_embedding USING HNSW ON products(embedding)
    PARTITION BY(category)
    OPTIONS (
        USE_RABITQ = true,
        SAMPLE_VECTOR_STATS_PROBABILITY = 0.05
    )

Vector Index Options

HNSW Algorithm Parameters

CONNECTIVITY (or M)

The number of bi-directional links created for each node during construction. Higher values improve recall but increase memory usage and construction time. Default: 16.

M_MAX

Maximum number of connections per layer. Default: derived from CONNECTIVITY.

EF_CONSTRUCTION

The size of the dynamic candidate list during index construction. Higher values improve index quality but increase construction time. Default: 200.

Distance Metrics

METRIC

The distance metric used for similarity search. Available options:

  • EUCLIDEAN_METRIC - L2 distance (default)

  • MANHATTAN_METRIC - L1 distance

  • DOT_PRODUCT_METRIC - Dot product (for normalized vectors)

  • EUCLIDEAN_SQUARE_METRIC - Squared L2 distance

  • COSINE_METRIC - Cosine similarity (recommended for embeddings)

RabitQ Quantization Options

USE_RABITQ

Enable RabitQ quantization to reduce memory usage. Default: false.

RABITQ_NUM_EX_BITS

Number of extra bits for RabitQ quantization. Higher values improve accuracy but increase memory usage. Valid range: 0-8. Default: 4.

Statistics and Monitoring

MAINTAIN_STATS_PROBABILITY

Probability of maintaining statistics during updates when using RabitQ. Default: 0.01.

SAMPLE_VECTOR_STATS_PROBABILITY

Probability of sampling vectors for statistics collection. Default: 0.0 (disabled).

Vector Index Limitations

  • Only one vector column can be indexed per vector index

  • The indexed column must be of type vector(dimensions, float)

  • Vector dimensions must be specified at table creation time

  • INCLUDE clause is not supported for vector indexes

  • Partitioning improves performance but is optional

Comparing Both Syntaxes

The two syntaxes are functionally equivalent and produce identical index structures. The choice between them is primarily a matter of style and organizational preference.

Value Index Comparison

These two approaches create identical indexes:

INDEX AS SELECT:

CREATE INDEX idx_category_price AS
    SELECT category, price, name
    FROM products
    ORDER BY category, price

INDEX ON:

CREATE INDEX idx_category_price ON products(category, price)
    INCLUDE(name)

Aggregate Index Comparison

These two approaches create identical aggregate indexes:

INDEX AS SELECT:

CREATE INDEX idx_sales_by_category AS
    SELECT category, SUM(amount)
    FROM sales
    GROUP BY category

INDEX ON:

CREATE VIEW v_sales_by_category AS
    SELECT category, SUM(amount) AS total_amount
    FROM sales
    GROUP BY category

CREATE INDEX idx_sales_by_category ON v_sales_by_category(category)
    INCLUDE(total_amount)

Parameters

Common Parameters

indexName

The name of the index. Must be unique within the schema template.

UNIQUE (optional)

Specifies that the index should enforce uniqueness constraints.

INDEX AS SELECT Parameters

query

A SELECT statement that defines the index structure. The query must be incrementally maintainable.

  • For value indexes: Must include an ORDER BY clause specifying the key columns

  • For aggregate indexes: Must include GROUP BY with a single aggregate function

  • May include WHERE clause for filtered indexes

INDEX ON Parameters

source

The name of the table or view to index.

columns

A comma-separated list of column names that form the index key. Each column can optionally specify:

  • Sort order: ASC (default) or DESC

  • NULL handling: NULLS FIRST or NULLS LAST

Example: category DESC, price ASC NULLS LAST

INCLUDE(valueColumns) (optional)

A comma-separated list of additional columns to store in the index as values (not part of the key). This creates a covering index that can satisfy queries without accessing the base table.

OPTIONS(...) (optional)

Index-specific configuration options. Available options depend on the index type.

Limitations

Both syntaxes share the same limitations because they use the same underlying index implementation:

  • Indexes must be incrementally maintainable

  • Aggregate indexes support only one aggregate function per index

  • Aggregate indexes require integer types for the aggregated column

  • The query structure must allow computing index updates from individual record changes

  • See Defining indexes for detailed limitations

See Also

  • create-schema-template - Schema template definition

  • Defining indexes - Detailed index definition and limitations

  • CREATE VIEW - Creating views for use with INDEX ON