GROUP BY

Groups rows that have the same values in specified columns into aggregated rows, typically used with aggregate functions.

Syntax

GROUP BY expression AS alias , HAVING having-expression

The GROUP BY clause is used in SELECT statements:

SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1
HAVING aggregate_function(column2) > value

Parameters

GROUP BY expression [AS alias], ...

Groups rows based on the values of one or more expressions. Each unique combination of expression values creates a separate group.

expression

Can be:

  • Column names

  • Nested field references (e.g., struct_column.field)

  • Expressions or calculations

alias (optional)

An optional alias for the grouping expression

HAVING condition (optional)

Filters groups after aggregation. The condition can reference:

  • Grouped columns

  • Aggregate functions (e.g., AVG(salary) > 100000)

  • Combinations using AND, OR, NOT

Unlike WHERE which filters rows before grouping, HAVING filters groups after aggregation.

Returns

Returns one row per unique combination of grouped values. When used with aggregate functions, computes aggregate values for each group.

Examples

Setup

For these examples, assume we have an employees table:

CREATE TABLE employees(
    id BIGINT,
    department STRING,
    role STRING,
    salary BIGINT,
    PRIMARY KEY(id))

CREATE INDEX dept_idx AS SELECT department FROM employees ORDER BY department
CREATE INDEX role_idx AS SELECT role FROM employees ORDER BY role

INSERT INTO employees VALUES
    (1, 'Engineering', 'Developer', 100000),
    (2, 'Engineering', 'Developer', 110000),
    (3, 'Engineering', 'Manager', 150000),
    (4, 'Sales', 'Representative', 80000),
    (5, 'Sales', 'Manager', 120000)

GROUP BY Single Column

Count employees by department:

SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department

department

employee_count

"Engineering"

3

"Sales"

2

GROUP BY Multiple Columns

Count employees by department and role:

SELECT department, role, COUNT(*) AS employee_count
FROM employees
GROUP BY department, role

department

role

employee_count

"Engineering"

"Developer"

2

"Engineering"

"Manager"

1

"Sales"

"Representative"

1

"Sales"

"Manager"

1

GROUP BY with Aggregate Functions

Calculate average salary by department:

SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department

department

avg_salary

"Engineering"

120000.0

"Sales"

100000.0

Calculate multiple aggregates:

SELECT department,
       COUNT(*) AS employee_count,
       MIN(salary) AS min_salary,
       MAX(salary) AS max_salary,
       AVG(salary) AS avg_salary
FROM employees
GROUP BY department

department

employee_count

min_salary

max_salary

avg_salary

"Engineering"

3

100000

150000

120000.0

"Sales"

2

80000

120000

100000.0

GROUP BY with HAVING Clause

Filter groups using HAVING:

SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 110000

department

avg_salary

"Engineering"

120000.0

The HAVING clause filters groups after aggregation, unlike WHERE which filters rows before grouping.

GROUP BY with Column Aliases

Use aliases for grouped columns:

SELECT department AS dept, COUNT(*) AS total
FROM employees
GROUP BY department AS dept

dept

total

"Engineering"

3

"Sales"

2

Important Notes

Index Requirement

GROUP BY requires an appropriate index to execute. Without a suitable index, the query will fail with an “unable to plan” error.

Example index creation:

CREATE INDEX dept_idx AS SELECT department FROM employees ORDER BY department

See Indexes for details on creating indexes that support GROUP BY operations.

Column Selection Rules

  • Only columns in the GROUP BY clause or aggregate functions can appear in the SELECT list

  • Selecting non-grouped, non-aggregated columns will result in error SQLSTATE 42803 (GROUPING_ERROR)

Invalid example:

-- ERROR: id is neither grouped nor aggregated
SELECT id, department, COUNT(*)
FROM employees
GROUP BY department

Valid example:

-- OK: all non-aggregated columns are grouped
SELECT department, role, COUNT(*)
FROM employees
GROUP BY department, role

Nested Fields

GROUP BY supports grouping on nested struct fields:

SELECT address.city, COUNT(*) AS resident_count
FROM people
GROUP BY address.city

Execution Model

FRL does not perform in-memory grouping. All GROUP BY operations must be backed by an appropriate index. This is a fundamental architectural constraint that ensures queries can execute efficiently over large datasets. An aggregate index will yield the best performance, but an index ordered by the desired grouping column will also work.

See Also