SQL EXIST

August 27, 2020

A way to think about EXIST is by com­par­ing it to JOIN. The dif­fer­ence between them is that EXIST is SQL’s way of express­ing semi­joins (and anti­joins). I tried read­ing a few resources but noth­ing really clicked for me, and so I wrote some imper­a­tive Python code to illus­trate these con­cepts:

managers = [('Bob', 1), ('Sally', 2)]
employees = [('Andy', 1), ('Mark', 1), ('Henry', None)]

def join(t1, t2):
    for u in t1:
        for v in t2:
            if u[1] == v[1]:
                yield (u, v)

def semijoin(t1, t2):
    for u in t1:
        for v in t2:
            if u[1] == v[1]:
                yield u
                break

def antijoin(t1, t2):
    for u in t1:
        matched = False
        for v in t2:
            if u[1] == v[1]:
                matched = True
                break
        if not matched:
            yield u

Here, we have a manager table which con­tains a list of man­agers and their IDs, and a employee table which con­tains a list of employ­ees and their man­ag­er IDs. (we can ignore the fact that man­agers are also employ­ees).

There are two man­agers, Bob and Sally. Bob has two employ­ees under him, Sally has no employ­ees under her. Henry has no man­ag­er cur­rent­ly assigned to him.

Let’s see what hap­pens when we do this:

print(list(join(managers, employees)))
>>> [(('Bob', 1), ('Andy', 1)), (('Bob', 1), ('Mark', 1))]

An (inner) join returns a list of all man­ag­er-employ­ee pairs.

What about:

print(list(semijoin(managers, employees)))
>>> [('Bob', 1)]

A semi­join returns all man­agers that have at least one employ­ee.

The reason this is the case here is because semi­joins short-cir­cuit on the pred­i­cate. The pred­i­cate here is manager.id = employee.manager_id, which means that as soon as we find an employ­ee whose man­ag­er is the cur­rent one we’re con­sid­er­ing, we can output the cur­rent man­ag­er and move on to the next man­ag­er. This con­cept extends sim­i­lar­ly for any arbi­trary pred­i­cate.

print(list(antijoin(managers, employees)))
>>> [('Sally', 2)]

An anti­join return all man­agers that do not have at least one employ­ee.

An anti­join works in the exact­ly oppo­site way from a semi­join. Anti­joins short-cir­cuit on the nega­tion of the pred­i­cate — as soon as we find an employ­ee whose man­ag­er is the cur­rent one we’re con­sid­er­ing, we skip the man­ag­er. If we’ve checked all the employ­ees and no employ­ee’s man­ag­er is the cur­rent man­ag­er, we can output the cur­rent man­ag­er.

In this way, the set of rows output by a semi­join is the exact com­ple­men­tary of the set of rows output by an anti­join.


Note that joins are com­mu­ta­tive, but semi­joins and anti­joins are not. If you flip the tables around:

print(list(semijoin(employees, managers)))
>>> [('Andy', 1), ('Mark', 1)]

This returns all employ­ees that have a man­ag­er.

print(list(antijoin(employees, managers)))
>>> [('Henry', None)]

This returns all employ­ees that do not have a man­ag­er.


Also note that joins can do the work of a semi­join:

# let's implement semijoin with join
def semijoinviajoin(t1, t2):
    results = join(t1, t2)
    return set([i[0] for i in results])
    
print(list(semijoinviajoin(employees, managers)))
>>> [('Bob', 1)]

What we did here was to pro­ject only the columns from the first table and dedu­pli­cate the results. Of course, this involves more memory to store the entire Carte­sian prod­uct of the join and more time to dedu­pli­cate the results.


These are the equiv­a­lent SQL queries for join, semijoin, and antijoin:

SELECT manager.id, employee.id
FROM manager
JOIN employee
ON manager.id = employee.manager_id;
SELECT id
FROM manager
WHERE EXISTS (
  SELECT 1 FROM employee
  WHERE manager.id = employee.manager_id
);
SELECT id
FROM manager
WHERE NOT EXISTS (
  SELECT 1 FROM employee
  WHERE manager.id = employee.manager_id
);

A good rule of thumb from a com­ment on one of the arti­cles I linked above:

I usu­al­ly follow anoth­er rule which might be help­ful. If you do not need columns from a table, then move it from FROM clause to WHERE clause.

Although this expla­na­tion is not as rig­or­ous, I hope it’s still useful as a com­pan­ion piece along­side other more accu­rate and elab­o­rate arti­cles.