SQL EXIST

August 27, 2020

A way to think about `EXIST` is by com­par­ing it to `JOIN`. The dif­fer­ence between them is that `EXIST` is SQL’s way of express­ing semi­joins (and anti­joins). I tried read­ing a few resources but noth­ing really clicked for me, and so I wrote some imper­a­tive Python code to illus­trate these con­cepts:

``````managers = [('Bob', 1), ('Sally', 2)]
employees = [('Andy', 1), ('Mark', 1), ('Henry', None)]

def join(t1, t2):
for u in t1:
for v in t2:
if u[1] == v[1]:
yield (u, v)

def semijoin(t1, t2):
for u in t1:
for v in t2:
if u[1] == v[1]:
yield u
break

def antijoin(t1, t2):
for u in t1:
matched = False
for v in t2:
if u[1] == v[1]:
matched = True
break
if not matched:
yield u``````

Here, we have a `manager` table which con­tains a list of man­agers and their IDs, and a `employee` table which con­tains a list of employ­ees and their man­ag­er IDs. (we can ignore the fact that man­agers are also employ­ees).

There are two man­agers, Bob and Sally. Bob has two employ­ees under him, Sally has no employ­ees under her. Henry has no man­ag­er cur­rent­ly assigned to him.

Let’s see what hap­pens when we do this:

``````print(list(join(managers, employees)))
>>> [(('Bob', 1), ('Andy', 1)), (('Bob', 1), ('Mark', 1))]``````

An (inner) join returns a list of all man­ag­er-employ­ee pairs.

``````print(list(semijoin(managers, employees)))
>>> [('Bob', 1)]``````

A semi­join returns all man­agers that have at least one employ­ee.

The reason this is the case here is because semi­joins short-cir­cuit on the pred­i­cate. The pred­i­cate here is `manager.id = employee.manager_id`, which means that as soon as we find an employ­ee whose man­ag­er is the cur­rent one we’re con­sid­er­ing, we can output the cur­rent man­ag­er and move on to the next man­ag­er. This con­cept extends sim­i­lar­ly for any arbi­trary pred­i­cate.

``````print(list(antijoin(managers, employees)))
>>> [('Sally', 2)]``````

An anti­join return all man­agers that do not have at least one employ­ee.

An anti­join works in the exact­ly oppo­site way from a semi­join. Anti­joins short-cir­cuit on the nega­tion of the pred­i­cate — as soon as we find an employ­ee whose man­ag­er is the cur­rent one we’re con­sid­er­ing, we skip the man­ag­er. If we’ve checked all the employ­ees and no employ­ee’s man­ag­er is the cur­rent man­ag­er, we can output the cur­rent man­ag­er.

In this way, the set of rows output by a semi­join is the exact com­ple­men­tary of the set of rows output by an anti­join.

Note that joins are com­mu­ta­tive, but semi­joins and anti­joins are not. If you flip the tables around:

``````print(list(semijoin(employees, managers)))
>>> [('Andy', 1), ('Mark', 1)]``````

This returns all employ­ees that have a man­ag­er.

``````print(list(antijoin(employees, managers)))
>>> [('Henry', None)]``````

This returns all employ­ees that do not have a man­ag­er.

Also note that joins can do the work of a semi­join:

``````# let's implement semijoin with join
def semijoinviajoin(t1, t2):
results = join(t1, t2)
return set([i[0] for i in results])

print(list(semijoinviajoin(employees, managers)))
>>> [('Bob', 1)]``````

What we did here was to pro­ject only the columns from the first table and dedu­pli­cate the results. Of course, this involves more memory to store the entire Carte­sian prod­uct of the join and more time to dedu­pli­cate the results.

These are the equiv­a­lent SQL queries for `join`, `semijoin`, and `antijoin`:

``````SELECT manager.id, employee.id
FROM manager
JOIN employee
ON manager.id = employee.manager_id;``````
``````SELECT id
FROM manager
WHERE EXISTS (
SELECT 1 FROM employee
WHERE manager.id = employee.manager_id
);``````
``````SELECT id
FROM manager
WHERE NOT EXISTS (
SELECT 1 FROM employee
WHERE manager.id = employee.manager_id
);``````

A good rule of thumb from a com­ment on one of the arti­cles I linked above:

I usu­al­ly follow anoth­er rule which might be help­ful. If you do not need columns from a table, then move it from FROM clause to WHERE clause.

Although this expla­na­tion is not as rig­or­ous, I hope it’s still useful as a com­pan­ion piece along­side other more accu­rate and elab­o­rate arti­cles.