GROUP BY¶
Groups rows that have the same values in specified columns into aggregated rows, typically used with aggregate functions.
Syntax¶
The GROUP BY clause is used in SELECT statements:
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1
HAVING aggregate_function(column2) > value
Parameters¶
GROUP BY expression [AS alias], ...Groups rows based on the values of one or more expressions. Each unique combination of expression values creates a separate group.
expressionCan be:
Column names
Nested field references (e.g.,
struct_column.field)Expressions or calculations
alias(optional)An optional alias for the grouping expression
HAVING condition(optional)Filters groups after aggregation. The condition can reference:
Grouped columns
Aggregate functions (e.g.,
AVG(salary) > 100000)Combinations using AND, OR, NOT
Unlike WHERE which filters rows before grouping, HAVING filters groups after aggregation.
Returns¶
Returns one row per unique combination of grouped values. When used with aggregate functions, computes aggregate values for each group.
Examples¶
Setup¶
For these examples, assume we have an employees table:
CREATE TABLE employees(
id BIGINT,
department STRING,
role STRING,
salary BIGINT,
PRIMARY KEY(id))
CREATE INDEX dept_idx AS SELECT department FROM employees ORDER BY department
CREATE INDEX role_idx AS SELECT role FROM employees ORDER BY role
INSERT INTO employees VALUES
(1, 'Engineering', 'Developer', 100000),
(2, 'Engineering', 'Developer', 110000),
(3, 'Engineering', 'Manager', 150000),
(4, 'Sales', 'Representative', 80000),
(5, 'Sales', 'Manager', 120000)
GROUP BY Single Column¶
Count employees by department:
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department
|
|
|---|---|
|
|
|
|
GROUP BY Multiple Columns¶
Count employees by department and role:
SELECT department, role, COUNT(*) AS employee_count
FROM employees
GROUP BY department, role
|
|
|
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
GROUP BY with Aggregate Functions¶
Calculate average salary by department:
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
|
|
|---|---|
|
|
|
|
Calculate multiple aggregates:
SELECT department,
COUNT(*) AS employee_count,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department
|
|
|
|
|
|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
GROUP BY with HAVING Clause¶
Filter groups using HAVING:
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 110000
|
|
|---|---|
|
|
The HAVING clause filters groups after aggregation, unlike WHERE which filters rows before grouping.
GROUP BY with Column Aliases¶
Use aliases for grouped columns:
SELECT department AS dept, COUNT(*) AS total
FROM employees
GROUP BY department AS dept
|
|
|---|---|
|
|
|
|
Important Notes¶
Index Requirement¶
GROUP BY requires an appropriate index to execute. Without a suitable index, the query will fail with an “unable to plan” error.
Example index creation:
CREATE INDEX dept_idx AS SELECT department FROM employees ORDER BY department
See Indexes for details on creating indexes that support GROUP BY operations.
Column Selection Rules¶
Only columns in the GROUP BY clause or aggregate functions can appear in the SELECT list
Selecting non-grouped, non-aggregated columns will result in error SQLSTATE 42803 (GROUPING_ERROR)
Invalid example:
-- ERROR: id is neither grouped nor aggregated
SELECT id, department, COUNT(*)
FROM employees
GROUP BY department
Valid example:
-- OK: all non-aggregated columns are grouped
SELECT department, role, COUNT(*)
FROM employees
GROUP BY department, role
Nested Fields¶
GROUP BY supports grouping on nested struct fields:
SELECT address.city, COUNT(*) AS resident_count
FROM people
GROUP BY address.city
Execution Model¶
FRL does not perform in-memory grouping. All GROUP BY operations must be backed by an appropriate index. This is a fundamental architectural constraint that ensures queries can execute efficiently over large datasets. An aggregate index will yield the best performance, but an index ordered by the desired grouping column will also work.
See Also¶
Aggregate Functions - Functions used with GROUP BY
Indexes - Creating indexes for GROUP BY
SELECT Statement - Full SELECT syntax