There are a number of different types of joins. Most joins can be either defined as an implicit join or an explicit join. An implicit join is one where there is no explicit JOIN statement yet there is a join implicit in the WHERE clause of the query. An explicit join is one where there is actually a JOIN statement in the FROM clause of the query.
An explicit join is also known as a qualified join, and uses the syntax
SELECT <columns>
FROM <table>
<join type> JOIN <table>
ON <join condition>
Cross joins
Given two tables, each row in the inner table joins each row of the outer table. I can think of almost no examples where you'd want to use this, but the implicit join syntax is:
SELECT * FROM
OrgUnit, Person
The explicit join syntax is:
SELECT * FROM OrgUnit
CROSS JOIN Person
A cross join is also known as a Cartesian product or a Cartesian join. I have never needed to use it! Note that this is the only join that does not use the ON syntax, as it's not needed and if you needed to restrict the data set then you'd need to filter in the WHERE clause of the query. Of course, if your filter does a comparison against two related columns of the datasets, then you have an INNER join, described below.
Inner joins
An "inner" join is named because given two tables, the query will return each row in the inner table that has a match in the outer table.
To show an example of this, I'll have another look at my two tables I introduced in the cross join section. They are OrgUnit and Person.
OrgUnitID | OrgUnitName |
---|---|
1 | Manufacturing |
2 | Accounting |
3 | Human Resources |
4 | Tiger team |
5 | IT |
6 | Consultancy |
7 | Special projects |
PersonID | FirstName | LastName | PhoneNumber | Address | OrgUnitID |
---|---|---|---|---|---|
1 | Chris | Sasquatch | 03 9998 7656 | 5 Curumbah St, Gladstone, VIC, 3145 | 5 |
2 | Jill | Anderson | 03 9765 3975 | 23 Sesqatonia Rd, Smallville, VIC, 3087 | 6 |
3 | Robert | Manaheim | 03 7659 8765 | 18a Xyz St, Whoknowswhere, VIC, 3009 | 2 |
4 | Mysterious | Guy | 02 8765 4321 | 78 Unknown St, NSW, 2000 | 1 |
5 | Captain | Obvious | 02 1111 1111 | 54 Duh Lane, NSW, 2119 | 4 |
6 | Mickey | Mouse | 07 7654 6665 | 1 Disney Rd, QLD, 7654 | 7 |
7 | Donald | Duck | 07 7654 6665 | 5 Drewery Lane, WA, 8097 | NULL |
8 | Daffy | Duck | 08 7654 3214 | 8 St Kilda Street, WA, 8008 | 7 |
9 | Bugs | Bunny | 08 7654 3214 | 1004 Killawee Rd, NT, 6543 | 7 |
If I wanted to find out all the Organization Units that each invididual resides in, I would run the following explicit query:
SELECT
p.FirstName,
p.LastName,
p.Address,
p.PhoneNumber,
o.OrgUnitName
FROM Person p
INNER JOIN OrgUnit
ON p.OrgUnitID=o.OrgUnitID
The implicit query is:
SELECT
p.FirstName,
p.LastName,
p.Address,
p.PhoneNumber,
o.OrgUnitName
FROM Person p, OrgUnit o
WHERE p.OrgUnitID=o.OrgUnitID
This gives the following results:
FirstName | LastName | PhoneNumber | Address | OrgUnitName |
---|---|---|---|---|
Chris | Sasquatch | 03 9998 7656 | 5 Curumbah St, Gladstone, VIC, 3145 | IT |
Jill | Anderson | 03 9765 3975 | 23 Sesqatonia Rd, Smallville, VIC, 3087 | Consultancy |
Robert | Manaheim | 03 7659 8765 | 18a Xyz St, Whoknowswhere, VIC, 3009 | Accounting |
Mysterious | Guy | 02 8765 4321 | 78 Unknown St, NSW, 2000 | Manufacturing |
Captain | Obvious | 02 1111 1111 | 54 Duh Lane, NSW, 2119 | Tiger team |
Mickey | Mouse | 07 7654 6665 | 1 Disney Rd, QLD, 7654 | Special projects |
Daffy | Duck | 08 7654 3214 | 8 St Kilda Street, WA, 8008 | Special projects |
Bugs | Bunny | 08 7654 3214 | 1004 Killawee Rd, NT, 6543 | Special projects |
Outer joins
An "outer" join is so named because given two tables, each row in the outer table is returned even if it does not meet the ON condition. There is no implicit join syntax for this sort of join - you must use the ANSI syntax.
There are a number of forms of outer joins: left outer join, right outer join and full outer join. Let's explore each of these.
Left outer joins
A left outer join returns the "left" rows of an outer join regardless of whether they met the ON condition or not. If the right hand table does not meet the join condition, then a NULL is displayed.
Let's say that I wanted to find all the people in my database and if they are part of an organization unit. I would do this with the following query:
SELECT
p.FirstName,
p.LastName,
p.Address,
p.PhoneNumber,
o.OrgUnitName
FROM Person p
LEFT OUTER JOIN OrgUnit
ON p.OrgUnitID=o.OrgUnitID
This would return the following results:
FirstName | LastName | PhoneNumber | Address | OrgUnitName |
---|---|---|---|---|
Chris | Sasquatch | 03 9998 7656 | 5 Curumbah St, Gladstone, VIC, 3145 | IT |
Jill | Anderson | 03 9765 3975 | 23 Sesqatonia Rd, Smallville, VIC, 3087 | Consultancy |
Robert | Manaheim | 03 7659 8765 | 18a Xyz St, Whoknowswhere, VIC, 3009 | Accounting |
Mysterious | Guy | 02 8765 4321 | 78 Unknown St, NSW, 2000 | Manufacturing |
Captain | Obvious | 02 1111 1111 | 54 Duh Lane, NSW, 2119 | Tiger team |
Mickey | Mouse | 07 7654 6665 | 1 Disney Rd, QLD, 7654 | Special projects |
Daffy | Duck | 08 7654 3214 | 8 St Kilda Street, WA, 8008 | Special projects |
Bugs | Bunny | 08 7654 3214 | 1004 Killawee Rd, NT, 6543 | Special projects |
Donald | Duck | 07 7654 6665 | 5 Drewery Lane, WA, 8097 | NULL |
I have highlighted in bold red the row that is returned in the left outer join that was not returned in the inner join. As you can see, Donald Duck exists in the left hand table, but is not part of a organization unit. With the left outer join he is still returned in the output, however as he is not part of any org unit it displays a NULL value.
Right outer joins
A right outer join is the counterpart of the left outer join. It returns the "right" rows of an outer join regardless of whether they met the ON condition or not. If the left hand table does not meet the join condition, then a NULL is displayed.
Let's say that I wanted to find all org units and if they have any members show who they are. I would run the following query:
SELECT
p.FirstName, p.LastName, o.OrgUnitName
FROM Person p
RIGHT OUTER JOIN OrgUnit
ON p.OrgUnitID=o.OrgUnitID
This returns the following results:
FirstName | LastName | OrgUnitName |
---|---|---|
Mysterious | Guy | Manufacturing |
Robert | Manaheim | Accounting |
NULL | NULL | Human Resources |
Captain | Obvious | Tiger team |
Chris | Sasquatch | IT |
Jill | Anderson | Consultancy |
Mickey | Mouse | Special projects |
Daffy | Duck | Special projects |
Bugs | Bunny | Special projects |
Again I have highlighted in bold red the important row that is returned by the right outer join. The human resources org unit of this company has nobody in it (silly rabbits!) but we still want to know all the org units of the company. Thus as the table OrgUnits is on the right hand side of the join we return the organization but NULLs for the left hand side table rows.
Full outer joins
A full outer join allows us to display each row in both left and right tables. If there is a row in the right hand table that does not match the ON join condition, then it displays the right hand table row and NULL values for the left hand table. If there is a row in the left hand table that does not match the ON join condition, then it displays the left hand table row and NULL values for the right hand table.
This may seem a bit like a cross join, but it is not. A cross join matches every row for the left and right hand tables regardless of the relationship between the tables. The full outer join maintains the relationships between the tables, it's just if either side does not have a matching row then it displays the row on the right/left hand side regardless.
If I wanted to see all the OrgUnits and each member of the OrgUnit, but I also wanted to see all the people in the database regardless of whether they belong to an OrgUnit, I would run the following query:
SELECT
o.OrgUnitName,
p.FirstName,
p.LastName,
p.PhoneNumber,
p.Address,
o.OrgUnitName
FROM Person p
FULL OUTER JOIN OrgUnit
ON p.OrgUnitID=o.OrgUnitID
FirstName | LastName | PhoneNumber | Address | OrgUnitName |
---|---|---|---|---|
Chris | Sasquatch | 03 9998 7656 | 5 Curumbah St, Gladstone, VIC, 3145 | IT |
Jill | Anderson | 03 9765 3975 | 23 Sesqatonia Rd, Smallville, VIC, 3087 | Consultancy |
NULL | NULL | NULL | NULL | Human Resources |
Robert | Manaheim | 03 7659 8765 | 18a Xyz St, Whoknowswhere, VIC, 3009 | Accounting |
Mysterious | Guy | 02 8765 4321 | 78 Unknown St, NSW, 2000 | Manufacturing |
Captain | Obvious | 02 1111 1111 | 54 Duh Lane, NSW, 2119 | Tiger team |
Mickey | Mouse | 07 7654 6665 | 1 Disney Rd, QLD, 7654 | Special projects |
Daffy | Duck | 08 7654 3214 | 8 St Kilda Street, WA, 8008 | Special projects |
Bugs | Bunny | 08 7654 3214 | 1004 Killawee Rd, NT, 6543 | Special projects |
Donald | Duck | 07 7654 6665 | 5 Drewery Lane, WA, 8097 | NULL |
Semi and anti-semi joins
The last sort of joins I want to talk about are semi-joins and anti-semi joins.
A semi-join will return rows from a table when there is one or more corresponding rows in the joining table. Interestingly, there isn't actually any explicit syntax for semi-joins, instead you must use the EXISTS statement.
For instance, let's say that I want to find all OrgUnits that have at least one Person. We could do the following:
SELECT o.OrgUnitName
FROM OrgUnit o
INNER JOIN Person p
However, this will return more than one record for the OrgUnit "Special projects".
Instead, a semi-join will do this for you. The syntax is:
SELECT o.OrgUnitID
FROM OrgUnit AS o
WHERE EXISTS
(
SELECT p.OrgUnitID FROM Person AS p
WHERE p.OrgUnitID = o.OrgUnitID
)
You can also do this by using an IN statement:
SELECT o.OrgUnitName
FROM OrgUnit AS o
WHERE o.OrgUnitID IN
(
SELECT p.OrgUnitID FROM Person AS p
WHERE p.OrgUnitID = o.OrgUnitID
)
An anti-semi join however, does the exact opposite. It will find any rows that do not have at least one matching row in the second table. To do this, use the NOT EXISTS statement.
Going back to our previous example, if we wanted to find all the departments that do not have any members, we would do the following:
SELECT o.OrgUnitID
FROM OrgUnit AS o
WHERE NOT EXISTS
(
SELECT p.OrgUnitID FROM Person AS p
WHERE p.OrgUnitID = o.OrgUnitID
)
With the NOT IN statement it would be:
SELECT o.OrgUnitName
FROM OrgUnit AS o
WHERE o.OrgUnitID NOT IN
(
SELECT p.OrgUnitID FROM Person AS p
WHERE p.OrgUnitID = o.OrgUnitID
)
Note 1: I picked this IN trick up at the following blog.
Note 2: I asked over at SQL Server Central whether they thought it would be a good idea to add a TOP 1 clause to the query. i.e. the correlated subquery becomes:
SELECT TOP 1 p.OrgUnitID FROM Person AS p
WHERE p.OrgUnitID = o.OrgUnitID
However, it was politely suggested I test myself. Fair enough I suppose :-)
Another way of doing a FULL OUTER join
Interestingly, we can get a FULL OUTER join by doing a left outer join of Table1 to Table2, then do a UNION ALL of a left anti-semi join of Table2 to Table1. Or in other words:
Table1 LEFT OUTER JOIN Table2
UNION ALL
Table2 LEFT ANTI-SEMI JOIN Table1
I picked this up from the book Inside Microsoft SQL Server 2005: Query Tuning and Optimization. It's useful because a nested loop can't do a FULL OUTER join, except that you can transform the FULL OUTER join into joins that the nested loop can handle. Which I thought was quite neat, in a totally geeky way of course :-)
Just realised I have a problem in one of the queries. In the semi-join example, it reads:
ReplyDeleteSELECT o.OrgUnitName
FROM OrgUnit o
INNER JOIN Person p
That should be:
SELECT o.OrgUnitName
FROM OrgUnit o
INNER JOIN Person p
on p.OrgUnitID = o.OrgUnitID
Don't want to edit the article as it might muck up the formatting!