======== GROUP BY ======== .. _group_by: Groups rows that have the same values in specified columns into aggregated rows, typically used with aggregate functions. Syntax ====== .. raw:: html :file: GROUP_BY.diagram.svg The GROUP BY clause is used in SELECT statements: .. code-block:: sql SELECT column1, aggregate_function(column2) FROM table_name GROUP BY column1 HAVING aggregate_function(column2) > value Parameters ========== ``GROUP BY expression [AS alias], ...`` Groups rows based on the values of one or more expressions. Each unique combination of expression values creates a separate group. ``expression`` Can be: - Column names - Nested field references (e.g., ``struct_column.field``) - Expressions or calculations ``alias`` (optional) An optional alias for the grouping expression ``HAVING condition`` (optional) Filters groups after aggregation. The condition can reference: - Grouped columns - Aggregate functions (e.g., ``AVG(salary) > 100000``) - Combinations using AND, OR, NOT Unlike WHERE which filters rows before grouping, HAVING filters groups after aggregation. Returns ======= Returns one row per unique combination of grouped values. When used with aggregate functions, computes aggregate values for each group. Examples ======== Setup ----- For these examples, assume we have an ``employees`` table: .. code-block:: sql CREATE TABLE employees( id BIGINT, department STRING, role STRING, salary BIGINT, PRIMARY KEY(id)) CREATE INDEX dept_idx AS SELECT department FROM employees ORDER BY department CREATE INDEX role_idx AS SELECT role FROM employees ORDER BY role INSERT INTO employees VALUES (1, 'Engineering', 'Developer', 100000), (2, 'Engineering', 'Developer', 110000), (3, 'Engineering', 'Manager', 150000), (4, 'Sales', 'Representative', 80000), (5, 'Sales', 'Manager', 120000) GROUP BY Single Column ----------------------- Count employees by department: .. code-block:: sql SELECT department, COUNT(*) AS employee_count FROM employees GROUP BY department .. list-table:: :header-rows: 1 * - :sql:`department` - :sql:`employee_count` * - :json:`"Engineering"` - :json:`3` * - :json:`"Sales"` - :json:`2` GROUP BY Multiple Columns -------------------------- Count employees by department and role: .. code-block:: sql SELECT department, role, COUNT(*) AS employee_count FROM employees GROUP BY department, role .. list-table:: :header-rows: 1 * - :sql:`department` - :sql:`role` - :sql:`employee_count` * - :json:`"Engineering"` - :json:`"Developer"` - :json:`2` * - :json:`"Engineering"` - :json:`"Manager"` - :json:`1` * - :json:`"Sales"` - :json:`"Representative"` - :json:`1` * - :json:`"Sales"` - :json:`"Manager"` - :json:`1` GROUP BY with Aggregate Functions ----------------------------------- Calculate average salary by department: .. code-block:: sql SELECT department, AVG(salary) AS avg_salary FROM employees GROUP BY department .. list-table:: :header-rows: 1 * - :sql:`department` - :sql:`avg_salary` * - :json:`"Engineering"` - :json:`120000.0` * - :json:`"Sales"` - :json:`100000.0` Calculate multiple aggregates: .. code-block:: sql SELECT department, COUNT(*) AS employee_count, MIN(salary) AS min_salary, MAX(salary) AS max_salary, AVG(salary) AS avg_salary FROM employees GROUP BY department .. list-table:: :header-rows: 1 * - :sql:`department` - :sql:`employee_count` - :sql:`min_salary` - :sql:`max_salary` - :sql:`avg_salary` * - :json:`"Engineering"` - :json:`3` - :json:`100000` - :json:`150000` - :json:`120000.0` * - :json:`"Sales"` - :json:`2` - :json:`80000` - :json:`120000` - :json:`100000.0` .. _having: GROUP BY with HAVING Clause ----------------------------- Filter groups using HAVING: .. code-block:: sql SELECT department, AVG(salary) AS avg_salary FROM employees GROUP BY department HAVING AVG(salary) > 110000 .. list-table:: :header-rows: 1 * - :sql:`department` - :sql:`avg_salary` * - :json:`"Engineering"` - :json:`120000.0` The HAVING clause filters groups after aggregation, unlike WHERE which filters rows before grouping. GROUP BY with Column Aliases ------------------------------ Use aliases for grouped columns: .. code-block:: sql SELECT department AS dept, COUNT(*) AS total FROM employees GROUP BY department AS dept .. list-table:: :header-rows: 1 * - :sql:`dept` - :sql:`total` * - :json:`"Engineering"` - :json:`3` * - :json:`"Sales"` - :json:`2` Important Notes =============== Index Requirement ----------------- **GROUP BY requires an appropriate index to execute.** Without a suitable index, the query will fail with an "unable to plan" error. Example index creation: .. code-block:: sql CREATE INDEX dept_idx AS SELECT department FROM employees ORDER BY department See :ref:`Indexes ` for details on creating indexes that support GROUP BY operations. Column Selection Rules ---------------------- * Only columns in the GROUP BY clause or aggregate functions can appear in the SELECT list * Selecting non-grouped, non-aggregated columns will result in error SQLSTATE 42803 (GROUPING_ERROR) **Invalid example**: .. code-block:: sql -- ERROR: id is neither grouped nor aggregated SELECT id, department, COUNT(*) FROM employees GROUP BY department **Valid example**: .. code-block:: sql -- OK: all non-aggregated columns are grouped SELECT department, role, COUNT(*) FROM employees GROUP BY department, role Nested Fields ------------- GROUP BY supports grouping on nested struct fields: .. code-block:: sql SELECT address.city, COUNT(*) AS resident_count FROM people GROUP BY address.city Execution Model --------------- FRL does not perform in-memory grouping. All GROUP BY operations must be backed by an appropriate index. This is a fundamental architectural constraint that ensures queries can execute efficiently over large datasets. An aggregate index will yield the best performance, but an index ordered by the desired grouping column will also work. See Also ======== * :ref:`Aggregate Functions ` - Functions used with GROUP BY * :ref:`Indexes ` - Creating indexes for GROUP BY * :ref:`SELECT Statement