Notes on "SQLite: The Database at the Edge of the Network"

August 09, 2020

Recent­ly watched a great pre­sen­ta­tion and intro­duc­tion to SQLite by its cre­ator, Dr. Richard Hipp.

A short sum­ma­ry of the video below.

SQLite is used in many places

Strong con­tender for most-widely deployed appli­ca­tion in the world; cer­tain­ly within the top five. SQLite is deployed in:

  • Every smart­phone (Android and iOS)
  • Every major brows­er (Chrome, Fire­fox, Safari)
  • Every instal­la­tion of Python and PHP
  • Many pop­u­lar desk­top appli­ca­tions

Some people have sug­gest­ed zlib as a viable com­peti­tor, but tech­ni­cal­i­ties abound on whether N appli­ca­tions using a shared copy of zlib counts N times or 1. SQLite is usu­al­ly sta­t­i­cal­ly linked by each appli­ca­tion that uses it.

Replete with fea­tures

Only one writer

A major con­trib­u­tor to SQLite’s com­pact­ness and reli­a­bil­i­ty is its self-imposed lim­i­ta­tion on lim­it­ing writes to a single process. There is no such lim­i­ta­tion on the number of con­cur­rent read­ers, how­ev­er.

SQLite can scale larger than you think

  • Up to 64-way joins
  • Up to 140TB per data­base/​file
  • Up to 125 data­bas­es per con­nec­tion

Well-tested

SQLite is used in the flight sofware for the A350 XWB family of air­craft. It has DO-178B (Soft­ware Con­sid­er­a­tions in Air­borne Sys­tems and Equip­ment Cer­ti­fi­ca­tion) cer­ti­fi­ca­tion, which means that it has 100% MC/​DC (mod­i­fied con­di­tion/​deci­sion cov­er­age) code cov­er­age. Every branch oper­a­tion in the code must be tested. SQLite has 22,000 branch opcodes, all told, and every one of them must be tested for cor­rect­ness to ensure that all pos­si­ble out­comes are expect­ed, and that no redun­dant branch­es are pre­sent. A sig­nif­i­cant por­tion of the test suite is gen­er­at­ed from high level specs.

To quote from the pre­sen­ta­tion:

I just ran a test run the other day. It did 177 mil­lion dif­fer­ent tests, all of which have to pass. And that’s just one plat­form. That’s one in three test suites, on a single plat­form. We have to do that on mul­ti­ple plat­forms: var­i­ous dif­fer­ent flavours of Unix, Win­dows, VxWorks, OpenB­SD, Mac. And we do it on dif­fer­ent proces­sors: 64-bit SPARC, which is big-endian, 32-bit Pow­er­PC, we do ARM, of course we do Intel.

The longest test pack­age, the one with 177 mil­lion tests, takes about 8 hours on a modern work­sta­tion. We’ve anoth­er that runs for 16 (hours)…

It takes three, four, five days, we took four days because we didn’t run into any seri­ous prob­lems. It takes around four days to run a full release-and-test cycle, three guys work­ing on it.

SQLite is an engine that lives with the appli­ca­tion

… as opposed to Post­gres and friends, in which the data­base engine lives togeth­er with the data on the other side of the net­work away from the appli­ca­tion. Because of this, it leads to the next point, which is:

SQLite does not com­pete with tra­di­tion­al client/​server data­bas­es

If:

  • Your data is stored remote­ly, or
  • Your data is too large to store on a single machine, or
  • You need sup­port for con­cur­rent writ­ers

A client/​server data­base engine should be used. Instead, SQLite com­petes with the usage of flat files (e.g. a bunch of JSON files to store local appli­ca­tion data).

A nifty use case of SQLite is as a cache-aside cache of a tra­di­tion­al client/​server data­base.

You can store blobs in SQLite

Adobe, when using SQLite for Light­room thumb­nails, did a bench­mark com­par­ing read and write speeds for blobs stored dire­cly on disk as com­pared to being stored in SQLite. For blobs small­er than 50kB, across var­i­ous DB page sizes, an up to 2x increase in through­put was observed. The reason for this is that at small­er blob sizes, the over­head of fopen starts to dom­i­nate (every­thing is in a single file in SQLite). The bench­mark is pub­lished here.

Useful tools exist

See sqldiff for diff­ing 2 SQLite data­bas­es.