Joins¶
Joins combine rows from two or more tables based on a related column. The FDB Record Layer supports the standard SQL join keywords INNER JOIN (or just JOIN), LEFT OUTER JOIN (or LEFT JOIN), and RIGHT OUTER JOIN (or RIGHT JOIN). As an alternative to the INNER JOIN syntax, inner joins can also be expressed using comma-separated table references in the FROM clause with join conditions specified in the WHERE clause.
Important
The FDB Record Layer does not support the FULL OUTER JOIN and CROSS JOIN variants.
Basic Join Syntax¶
Cross Join (Cartesian Product)¶
The FDB Record Layer does not support the CROSS JOIN keyword. List multiple tables separated by commas instead:
SELECT columns FROM table1, table2
This produces a Cartesian product of all rows from both tables.
Inner Join with ON¶
Use the ON clause to specify join conditions:
SELECT columns
FROM table1 INNER JOIN table2
ON table1.column = table2.column
This is equivalent to a SELECT from comma-separated sources with a WHERE clause:
SELECT columns
FROM table1, table2
WHERE table1.column = table2.column
Inner Join with USING¶
The USING clause is a shorthand for when the joined tables share identically named columns. The columns are specified in the USING clause:
CREATE TABLE a(c1, c2)
CREATE TABLE b(c1, c3)
SELECT columns
FROM a INNER JOIN b USING(c1)
This is equivalent to:
SELECT columns
FROM a INNER JOIN b ON a.c1 = b.c1
And also equivalent to:
SELECT columns
FROM a, b WHERE a.c1 = b.c1
An important feature of INNER JOIN … USING is that it hides duplicate columns from the output:
CREATE TABLE a(c1, c2)
CREATE TABLE b(c1, c3)
SELECT *
FROM a INNER JOIN b USING(c1)
In this case, SELECT * returns only three columns:
c1 |
c2 |
c3 |
However, the joining columns can still be accessed using qualified names:
SELECT a.c1, b.c1
FROM a INNER JOIN b USING(c1)
This returns two identical columns:
a.c1 |
b.c1 |
INNER JOIN USING maintains the standard column order from left to right, excluding duplicates:
CREATE TABLE a(c1, c2, c5, c6)
CREATE TABLE b(c1, c3, c5, c7)
SELECT *
FROM a INNER JOIN b USING(c1, c5)
This returns six columns: all columns from a (c1, c2, c5, c6) followed by the non-duplicate columns from b (c3, c7).
c1 |
c2 |
c5 |
c6 |
c3 |
c7 |
Examples¶
Setup¶
For these examples, assume the following tables:
CREATE TABLE emp(
id BIGINT,
fname STRING,
lname STRING,
dept_id BIGINT,
PRIMARY KEY(id)
)
CREATE TABLE dept(
id BIGINT,
name STRING,
PRIMARY KEY(id)
)
CREATE TABLE project(
id BIGINT,
name STRING,
dsc STRING,
emp_id BIGINT,
PRIMARY KEY(id)
)
INSERT INTO emp VALUES
(1, 'Jack', 'Williams', 1),
(2, 'Thomas', 'Johnson', 1),
(3, 'Emily', 'Martinez', 1),
(5, 'Daniel', 'Miller', 2),
(8, 'Megan', 'Miller', 3)
INSERT INTO dept VALUES
(1, 'Engineering'),
(2, 'Sales'),
(3, 'Marketing')
INSERT INTO project VALUES
(1, 'OLAP', 'Support OLAP queries', 3),
(2, 'SEO', 'Increase visibility on search engines', 8),
(3, 'Feedback', 'Turn customer feedback into items', 5)
Simple Two-Table Join¶
Join employees with their departments:
SELECT fname, lname
FROM emp INNER JOIN dept
ON emp.dept_id = dept.id
AND dept.name = 'Engineering'
|
|
|---|---|
|
|
|
|
|
|
Consecutive Joins¶
Join across three tables to find departments and their projects:
SELECT dept.name, project.name
FROM emp INNER JOIN dept ON emp.dept_id = dept.id
INNER JOIN project ON project.emp_id = emp.id
|
|
|---|---|
|
|
|
|
|
|
The result of the first join (employees and departments) is joined to projects.
Join with Subquery¶
Use a derived table (subquery) in a join:
SELECT fname, lname
FROM (
SELECT fname, lname, dept_id
FROM emp
WHERE EXISTS (SELECT * FROM project WHERE emp_id = emp.id)
) AS sq INNER JOIN dept
ON sq.dept_id = dept.id
AND dept.name = 'Sales'
|
|
|---|---|
|
|
This finds employees who are assigned to projects and work in the Sales department.
Nested Joins¶
Join subqueries that themselves contain joins:
SELECT sq.name, project.name
FROM (
SELECT dept.name, emp.id
FROM emp INNER JOIN dept
ON emp.dept_id = dept.id
) AS sq INNER JOIN project
ON project.emp_id = sq.id
|
|
|---|---|
|
|
|
|
|
|
The subquery first joins employees with departments; the result is then joined with projects.
Join with CTEs¶
Use Common Table Expressions in joins:
WITH c1(w, z) AS (SELECT id, col1 FROM t1),
c2(a, b) AS (SELECT id, col1 FROM t1 WHERE id IN (1, 2))
SELECT * FROM c1, c2
This creates two CTEs and joins them using a cross join.
Self-Join¶
Join a table to itself:
SELECT * FROM Table1, Table1 WHERE col1 = 10
A self-join can be used to find relationships within the same table. Use aliases to distinguish between the two references:
SELECT t1.fname, t2.fname
FROM emp t1, emp t2
WHERE t1.dept_id = t2.dept_id
AND t1.id < t2.id
Semi-Join with EXISTS¶
Use EXISTS to implement a semi-join (find rows that have matching rows in another table):
SELECT fname, lname
FROM emp
WHERE EXISTS (
SELECT * FROM project WHERE emp_id = emp.id
)
|
|
|---|---|
|
|
|
|
|
|
This finds all employees who have at least one project assigned, without returning duplicate employee rows.
Join with User-Defined Functions¶
User-defined functions can be used like tables in the FROM clause and joined with conditions in the WHERE clause:
SELECT A.col1, A.col2, B.col1, B.col2
FROM f1(103, 'b') A, f1(103, 'b') B
WHERE A.col1 = B.col1
Important Notes¶
Table Aliases¶
Use aliases to:
Distinguish between multiple references to the same table.
Shorten long table names.
Reference columns from specific tables in multi-table joins.
SELECT e.fname, d.name
FROM emp e, dept d
WHERE e.dept_id = d.id
Join Conditions¶
Join conditions should be specified in the
WHEREclause (for comma-separated tables) or theONclause (forINNER JOIN,LEFT OUTER JOIN, andRIGHT OUTER JOIN).Use
ANDto combine multiple join conditions and filters.Omitting join conditions produces a Cartesian product (all combinations of rows).
See Also¶
JOIN - JOIN syntax
SELECT - SELECT statement syntax
WHERE - WHERE clause filtering
Subqueries - Subqueries and correlated subqueries
WITH - Common Table Expressions