CREATE INDEX¶
Clause in a schema template definition to create an index. The Record Layer supports two different syntaxes for creating indexes:
INDEX AS SELECT - A query-based syntax inspired by materialized views
INDEX ON - A traditional syntax that creates indexes on views or tables
Both syntaxes create indexes that are maintained incrementally. For any given record insert, update, or delete, the system constructs the difference that needs to be applied to each index in order to update it without needing to completely rebuild it. This means there are limitations to what kinds of indexes can be created. See Defining indexes for more details.
Syntax¶
INDEX AS SELECT Syntax¶
The INDEX AS SELECT syntax uses a query to define the index structure. This syntax is inspired by materialized
views and allows you to define indexes using familiar SQL SELECT queries with GROUP BY for aggregate indexes.
Basic Form¶
CREATE INDEX indexName AS query
The query must be a SELECT statement that returns the columns to be indexed. The index structure is derived
from the query’s SELECT list and ORDER BY clause.
Examples¶
Simple Value Index
CREATE INDEX idx_price AS
SELECT price
FROM products
ORDER BY price
Covering Value Index
CREATE INDEX idx_category_price AS
SELECT category, price, name
FROM products
ORDER BY category, price
This creates an index with category and price in the key, and name as a covered column.
Aggregate Index (SUM)
CREATE INDEX idx_sales_by_category AS
SELECT category, SUM(amount)
FROM sales
GROUP BY category
Aggregate Index (COUNT)
COUNT(*) counts all records in each group:
CREATE INDEX idx_count_by_region AS
SELECT region, COUNT(*)
FROM sales
GROUP BY region
COUNT(column) counts non-NULL values:
CREATE INDEX idx_count_non_null AS
SELECT category, COUNT(quantity)
FROM sales
GROUP BY category
Aggregate Index (MIN/MAX)
MIN and MAX support permuted ordering, where the aggregate value can appear at different positions in the key:
CREATE INDEX idx_max_by_category AS
SELECT category, MAX(amount)
FROM sales
GROUP BY category
ORDER BY category, MAX(amount)
The aggregate can also appear in the middle of the key:
CREATE INDEX idx_cat_max_region AS
SELECT category, MAX(amount), region
FROM sales
GROUP BY category, region
ORDER BY category, MAX(amount), region
Aggregate Index (MIN_EVER/MAX_EVER)
MIN_EVER and MAX_EVER track the minimum/maximum value ever seen, even after deletions. These are useful for maintaining historical extrema:
CREATE INDEX idx_min_ever_by_category AS
SELECT category, MIN_EVER(price)
FROM products
GROUP BY category
CREATE INDEX idx_max_ever_by_region AS
SELECT region, MAX_EVER(sales_amount)
FROM transactions
GROUP BY region
For legacy compatibility with older versions, you can use the LEGACY_EXTREMUM_EVER attribute:
CREATE INDEX idx_min_ever_legacy AS
SELECT category, MIN_EVER(price)
FROM products
GROUP BY category
WITH ATTRIBUTES LEGACY_EXTREMUM_EVER
Bitmap Aggregate Index
Bitmap indexes use specialized functions for efficient set operations and are particularly useful for filtering on high-cardinality columns:
CREATE INDEX idx_bitmap_by_category AS
SELECT bitmap_construct_agg(bitmap_bit_position(id)) AS bitmap,
category,
bitmap_bucket_offset(id) AS offset
FROM products
GROUP BY category, bitmap_bucket_offset(id)
For ungrouped bitmap indexes:
CREATE INDEX idx_bitmap_all AS
SELECT bitmap_construct_agg(bitmap_bit_position(id)) AS bitmap,
bitmap_bucket_offset(id) AS offset
FROM products
GROUP BY bitmap_bucket_offset(id)
Aggregate with Expressions in GROUP BY
You can use expressions in GROUP BY clauses:
CREATE INDEX idx_sum_by_expression AS
SELECT amount + 100, MAX(quantity)
FROM sales
GROUP BY amount + 100
CREATE INDEX idx_multi_expression AS
SELECT category_id + region_id, status + 10, MIN(price)
FROM products
GROUP BY category_id + region_id, status + 10
Filtered Index
CREATE INDEX idx_expensive_products AS
SELECT name, price
FROM products
WHERE price > 100
ORDER BY price
Multiple Grouping Columns
CREATE INDEX idx_sales_by_category_region AS
SELECT category, region, SUM(amount)
FROM sales
GROUP BY category, region
Descending Order
CREATE INDEX idx_price_desc AS
SELECT price
FROM products
ORDER BY price DESC
Mixed Ordering
CREATE INDEX idx_category_desc_price_asc AS
SELECT category, price
FROM products
ORDER BY category DESC, price ASC
NULL Ordering
CREATE INDEX idx_rating_nulls_last AS
SELECT rating
FROM products
ORDER BY rating ASC NULLS LAST
CREATE INDEX idx_supplier_desc_nulls_first AS
SELECT supplier
FROM products
ORDER BY supplier DESC NULLS FIRST
Aggregate Index Capabilities¶
Aggregate indexes support the following aggregate functions:
Supported Aggregate Functions
SUM(column)- Sum of valuesCOUNT(*)- Count of all rowsCOUNT(column)- Count of non-NULL valuesMIN(column)- Minimum value (supports permuted ordering)MAX(column)- Maximum value (supports permuted ordering)MIN_EVER(column)- Historical minimum (persists across deletions)MAX_EVER(column)- Historical maximum (persists across deletions)BITMAP_CONSTRUCT_AGG(bitmap_bit_position(column))- Bitmap construction for set operations
Permuted Ordering
MIN and MAX indexes support permuted ordering, meaning the aggregate value can appear at any position in the ORDER BY clause:
-- Aggregate at the end (standard)
CREATE INDEX idx1 AS
SELECT category, MAX(amount)
FROM sales
GROUP BY category
ORDER BY category, MAX(amount)
-- Aggregate in the middle
CREATE INDEX idx2 AS
SELECT category, MAX(amount), region
FROM sales
GROUP BY category, region
ORDER BY category, MAX(amount), region
-- Aggregate at the beginning
CREATE INDEX idx3 AS
SELECT MIN(amount), category, region
FROM sales
GROUP BY category, region
ORDER BY MIN(amount), category, region
Aggregate Index Limitations
Only one aggregate function per index
Aggregate columns must be integer types (bigint) for most functions (SUM, COUNT, MIN, MAX)
MIN_EVER and MAX_EVER work with strings and other comparable types
Expressions in GROUP BY are supported
WHERE clauses can be used with aggregate indexes for filtered aggregates
INDEX ON Syntax¶
The INDEX ON syntax creates an index on an existing view or table using a traditional columnar specification.
This syntax is particularly useful when combined with views that define filtering or aggregation logic.
Basic Form¶
CREATE INDEX indexName ON source(columns) [INCLUDE(valueColumns)] [OPTIONS(...)]
Where:
- source is a table or view name
- columns specifies the index key columns (with optional ordering)
- INCLUDE clause adds covered columns stored as values
- OPTIONS clause specifies index-specific options
Examples¶
Simple Value Index
CREATE INDEX idx_price ON products(price)
Multi-Column Index
CREATE INDEX idx_category_price ON products(category, price)
Covering Index with INCLUDE
CREATE INDEX idx_category_price_covering ON products(category, price)
INCLUDE(name, stock)
This creates an index with category and price in the key, and name and stock as covered values.
Aggregate Index (SUM)
First define a view with the aggregation:
CREATE VIEW v_sales_by_category AS
SELECT category, SUM(amount) AS total_amount
FROM sales
GROUP BY category
CREATE INDEX idx_sales_by_category ON v_sales_by_category(category)
INCLUDE(total_amount)
Aggregate Index (COUNT)
CREATE VIEW v_count_by_region AS
SELECT region, COUNT(*) AS record_count
FROM sales
GROUP BY region
CREATE INDEX idx_count_by_region ON v_count_by_region(region)
INCLUDE(record_count)
Filtered Index
First define a view with the filter:
CREATE VIEW v_expensive_products AS
SELECT name, price
FROM products
WHERE price > 100
CREATE INDEX idx_expensive_products ON v_expensive_products(price)
Custom Ordering
CREATE INDEX idx_category_desc ON products(category DESC, price ASC)
NULL Ordering
CREATE INDEX idx_rating_nulls_last ON products(rating ASC NULLS LAST)
Column Ordering and NULL Handling¶
When creating an index using the INDEX ON syntax, each key column can specify sorting criteria and null semantics
to control how values are ordered in the index.
Sorting Criteria
Each key column supports the following sort orders:
ASC(ascending) - Values are sorted from smallest to largest (default if not specified)DESC(descending) - Values are sorted from largest to smallest
NULL Semantics
You can control where NULL values appear in the sort order:
NULLS FIRST- NULL values appear before non-NULL valuesNULLS LAST- NULL values appear after non-NULL values
Default NULL behavior:
- For ASC ordering: NULLS FIRST is the default
- For DESC ordering: NULLS LAST is the default
Syntax Options
The ordering clause for each column can take the following forms:
Sort order only:
columnName ASCorcolumnName DESCSort order with null semantics:
columnName ASC NULLS LASTorcolumnName DESC NULLS FIRSTNull semantics only:
columnName NULLS FIRSTorcolumnName NULLS LAST(uses default ASC ordering)
Examples
-- Ascending order with nulls last
CREATE INDEX idx_price ON products(price ASC NULLS LAST)
-- Descending order with nulls first
CREATE INDEX idx_rating ON products(rating DESC NULLS FIRST)
-- Specify only null semantics (ascending is implicit)
CREATE INDEX idx_stock ON products(stock NULLS LAST)
-- Mixed ordering across multiple columns
CREATE INDEX idx_complex ON products(
category ASC NULLS FIRST,
price DESC NULLS LAST,
name ASC
)
VECTOR INDEX Syntax¶
The VECTOR INDEX syntax creates an index specifically designed for vector similarity search using the HNSW
(Hierarchical Navigable Small World) algorithm. Vector indexes enable efficient approximate nearest neighbor (ANN)
search on high-dimensional vector data.
Basic Form¶
CREATE VECTOR INDEX indexName USING HNSW ON source(vectorColumn)
[PARTITION BY(partitionColumns)]
[OPTIONS(...)]
Where:
- source is a table or view name
- vectorColumn is a column of type vector(dimensions, float)
- PARTITION BY clause specifies partitioning columns (optional but recommended)
- OPTIONS clause specifies HNSW-specific configuration options
Note
The PARTITION BY clause is only applicable to vector indexes. It is not supported for regular value indexes
created with the INDEX ON syntax. Partitioning helps organize vectors by category or tenant, improving
query performance for vector similarity searches pertaining specific category or tenant.
Examples¶
Simple Vector Index
CREATE TABLE products(
id bigint,
name string,
embedding vector(128, float),
primary key(id)
)
CREATE VECTOR INDEX idx_embedding USING HNSW ON products(embedding)
Vector Index with Partitioning
Partitioning is recommended for better performance and to organize vectors by category or tenant:
CREATE TABLE products(
id bigint,
category string,
embedding vector(256, float),
primary key(id)
)
CREATE VECTOR INDEX idx_embedding USING HNSW ON products(embedding)
PARTITION BY(category)
Vector Index with Custom Options
CREATE VECTOR INDEX idx_embedding USING HNSW ON products(embedding)
PARTITION BY(category)
OPTIONS (
CONNECTIVITY = 16,
M_MAX = 32,
EF_CONSTRUCTION = 200,
METRIC = COSINE_METRIC
)
Vector Index on Filtered View
CREATE VIEW v_active_products AS
SELECT id, embedding, category
FROM products
WHERE status = 'active'
CREATE VECTOR INDEX idx_active_embeddings USING HNSW ON v_active_products(embedding)
PARTITION BY(category)
Vector Index with RabitQ Quantization
RabitQ is a quantization technique that reduces memory usage for high-dimensional vectors:
CREATE VECTOR INDEX idx_embedding USING HNSW ON products(embedding)
PARTITION BY(category)
OPTIONS (
USE_RABITQ = true,
RABITQ_NUM_EX_BITS = 4,
MAINTAIN_STATS_PROBABILITY = 0.01
)
Vector Index with Statistics Sampling
CREATE VECTOR INDEX idx_embedding USING HNSW ON products(embedding)
PARTITION BY(category)
OPTIONS (
USE_RABITQ = true,
SAMPLE_VECTOR_STATS_PROBABILITY = 0.05
)
Vector Index Options¶
HNSW Algorithm Parameters¶
CONNECTIVITY(orM)The number of bi-directional links created for each node during construction. Higher values improve recall but increase memory usage and construction time. Default: 16.
M_MAXMaximum number of connections per layer. Default: derived from CONNECTIVITY.
EF_CONSTRUCTIONThe size of the dynamic candidate list during index construction. Higher values improve index quality but increase construction time. Default: 200.
Distance Metrics¶
METRICThe distance metric used for similarity search. Available options:
EUCLIDEAN_METRIC- L2 distance (default)MANHATTAN_METRIC- L1 distanceDOT_PRODUCT_METRIC- Dot product (for normalized vectors)EUCLIDEAN_SQUARE_METRIC- Squared L2 distanceCOSINE_METRIC- Cosine similarity (recommended for embeddings)
RabitQ Quantization Options¶
USE_RABITQEnable RabitQ quantization to reduce memory usage. Default: false.
RABITQ_NUM_EX_BITSNumber of extra bits for RabitQ quantization. Higher values improve accuracy but increase memory usage. Valid range: 0-8. Default: 4.
Statistics and Monitoring¶
MAINTAIN_STATS_PROBABILITYProbability of maintaining statistics during updates when using RabitQ. Default: 0.01.
SAMPLE_VECTOR_STATS_PROBABILITYProbability of sampling vectors for statistics collection. Default: 0.0 (disabled).
Vector Index Limitations¶
Only one vector column can be indexed per vector index
The indexed column must be of type
vector(dimensions, float)Vector dimensions must be specified at table creation time
INCLUDE clause is not supported for vector indexes
Partitioning improves performance but is optional
Comparing Both Syntaxes¶
The two syntaxes are functionally equivalent and produce identical index structures. The choice between them is primarily a matter of style and organizational preference.
Value Index Comparison¶
These two approaches create identical indexes:
INDEX AS SELECT:
CREATE INDEX idx_category_price AS
SELECT category, price, name
FROM products
ORDER BY category, price
INDEX ON:
CREATE INDEX idx_category_price ON products(category, price)
INCLUDE(name)
Aggregate Index Comparison¶
These two approaches create identical aggregate indexes:
INDEX AS SELECT:
CREATE INDEX idx_sales_by_category AS
SELECT category, SUM(amount)
FROM sales
GROUP BY category
INDEX ON:
CREATE VIEW v_sales_by_category AS
SELECT category, SUM(amount) AS total_amount
FROM sales
GROUP BY category
CREATE INDEX idx_sales_by_category ON v_sales_by_category(category)
INCLUDE(total_amount)
Parameters¶
Common Parameters¶
indexNameThe name of the index. Must be unique within the schema template.
UNIQUE(optional)Specifies that the index should enforce uniqueness constraints.
INDEX AS SELECT Parameters¶
queryA SELECT statement that defines the index structure. The query must be incrementally maintainable.
For value indexes: Must include an ORDER BY clause specifying the key columns
For aggregate indexes: Must include GROUP BY with a single aggregate function
May include WHERE clause for filtered indexes
INDEX ON Parameters¶
sourceThe name of the table or view to index.
columnsA comma-separated list of column names that form the index key. Each column can optionally specify:
Sort order:
ASC(default) orDESCNULL handling:
NULLS FIRSTorNULLS LAST
Example:
category DESC, price ASC NULLS LASTINCLUDE(valueColumns)(optional)A comma-separated list of additional columns to store in the index as values (not part of the key). This creates a covering index that can satisfy queries without accessing the base table.
OPTIONS(...)(optional)Index-specific configuration options. Available options depend on the index type.
Limitations¶
Both syntaxes share the same limitations because they use the same underlying index implementation:
Indexes must be incrementally maintainable
Aggregate indexes support only one aggregate function per index
Aggregate indexes require integer types for the aggregated column
The query structure must allow computing index updates from individual record changes
See Defining indexes for detailed limitations
See Also¶
create-schema-template - Schema template definition
Defining indexes - Detailed index definition and limitations
CREATE VIEW - Creating views for use with INDEX ON