Adding ripgrep and find Support to org-roam

June 06, 2020

Despite having using Emacs for one and a half years, for fun and pro­fes­sion­al­ly, I haven’t had an impe­tus to advance my under­stand­ing of Elisp beyond scav­eng­ing oth­er­s’ dot­files. This changed two weeks ago with my first Elisp PR. I was chat­ting with a good friend and ex-col­league of mine, Jethro, about an Emacs pack­age that he wrote called org-roam which explod­ed in pop­u­lar­i­ty in the Emacs world. He talked about how much there was to do to main­tain the pro­ject, and I fig­ured it would be a good oppor­tu­ni­ty to help out and stop pro­cras­ti­nat­ing on learn­ing Elisp.

I won’t spend this post talk­ing about what org-roam does. Rather, this post is a short com­men­tary about the PR.

The link for the PR is here: https://​​org-roam/​org-roam/​pull/​664.

The aim of the PR is to add sup­port for using shell com­mands when look­ing for org-roam files locat­ed recur­sive­ly in a direc­to­ry. In org-roam, this is accom­plished with the org-roam--list-files func­tion, which prior to this PR used a pure Elisp imple­men­ta­tion, below:

(defun org-roam--list-files (dir)
  "Return all Org-roam files located within DIR, at any nesting level.
Ignores hidden files and directories."
  (let ((regex (concat "\\.\\(?:"(mapconcat #'regexp-quote org-roam-file-extensions "\\|" )"\\)\\(?:\\.gpg\\)?\\'"))
    (dolist (file (directory-files-recursively dir regex) result)
      (when (and (file-readable-p file) (org-roam--org-file-p file))
        (push file result)))))

Here, org-roam-file-extensions is typ­i­cal­ly a list like '(".org"), and what this func­tion does is to first con­struct a regex that will match all files ending with .org or .org.gpg, and then call directory-file-recursively with that regex.

Since we want to del­e­gate the file search­ing to a shell com­mand, it would be pru­dent to allow the user to spec­i­fy the tool used (or not, in which case we would fall back to the pure Elisp imple­men­ta­tion). This is accom­plished with a new user option vari­able called org-roam-list-files-commands:

(defcustom org-roam-list-files-commands '(find rg)
  "Commands that will be used to find Org-roam files.

It should be a list of symbols or cons cells representing any of the following
 supported file search methods.

The commands will be tried in order until an executable for a command is found.
The Elisp implementation is used if no command in the list is found.

    Use ripgrep as the file search method.
    Example command: rg /path/to/dir --files -g \"*.org\" -g \"*.org.gpg\"

    Use find as the file search method.
    Example command:
    find /path/to/dir -type f \( -name \"*.org\" -o -name \"*.org.gpg\" \)

By default, `executable-find' will be used to look up the path to the
executable. If a custom path is required, it can be specified together with the
method symbol as a cons cell. For example: '(find (rg . \"/path/to/rg\"))."
  :type '(set (const :tag "find" find)
              (const :tag "rg" rg)))

org-roam-list-files-commands is defined as a list of either sym­bols or cons cells, which will be eval­u­at­ed in order. If it is an rg or a find symbol, then we will attempt to use the respec­tive exe­cuta­bles as found by executable-find. Oth­er­wise, if the exe­cutable lives in a custom loca­tion, it can be spec­i­fied with a cons cell whose car is the symbol and the cdr is an absolute path to the exe­cutable loca­tion, e.g. (find . "/path/to/find").

(defun org-roam--list-files (dir)
  "Return all Org-roam files located recursively within DIR.
Use external shell commands if defined in `org-roam-list-files-commands'."
  (let (path exe)
    (cl-dolist (cmd org-roam-list-files-commands)
      (pcase cmd
        (`(,e . ,path)
         (setq path (executable-find path)
               exe  (symbol-name e)))
        ((pred symbolp)
         (setq path (executable-find (symbol-name cmd))
               exe (symbol-name cmd)))
         (signal 'wrong-type-argument
                          `((consp symbolp)
      (when path (cl-return)))
    (if path
        (let ((fn (intern (concat "org-roam--list-files-" exe))))
          (unless (fboundp fn) (user-error "%s is not an implemented search method" fn))
          (funcall fn path dir))
      (org-roam--list-files-elisp dir))))

We then update the body of org-roam--list-files to iter­ate org-roam-list-files-commands using cl-dolist, pat­tern match­ing on each value with pcase. If the value match­es a cons cell (`(,e . ,path)), we will use the path as spec­i­fied in the cdr. Oth­er­wise, if the value match­es a symbol (pred symbolp), we will attempt to find the path of the exe­cutable with (executable-find (symbol-name cmd)). We exit early when a path has been found (when path (cl-return)). If a value is nei­ther a cons cell nor a symbol, we will signal an error to the user using signal 'wrong-type-argument.

Once the path is found, we then use a little bit of magic to “reflect” on the method name with intern. If exe is rg, we will invoke org-roam--list-files-rg with the path and given direc­to­ry using funcall. If exe is find instead, we will invoke org-roam--list-files-find. Note that because of this, adding sup­port for a shell tool is as simple as adding a new org-roam--list-files-$SHELL_TOOL func­tion, and spec­i­fy­ing it in org-roam-list-files-commands.

If no suit­able path is found, we fall back to the pure Elisp imple­men­ta­tion, which is the first func­tion in this post, renamed as org-roam--list-files-elisp.

The org-roam--list-files-rg and org-roam--list-files-find func­tions are given below. They are straight­for­ward func­tions that con­struct the com­mand strings.

(defun org-roam--list-files-rg (executable dir)
  "Return all Org-roam files located recursively within DIR, using ripgrep, provided as EXECUTABLE."
  (let* ((globs (org-roam--list-files-search-globs org-roam-file-extensions))
         (command (s-join " " `(,executable ,dir "--files"
                                            ,@(mapcar (lambda (glob) (concat "-g " glob)) globs)))))
    (org-roam--shell-command-files command)))

The full shell com­mand used for rg is:

rg /path/to/dir --files -g "*.org" -g "*.org.gpg"
(defun org-roam--list-files-find (executable dir)
  "Return all Org-roam files located recursively within DIR, using find, provided as EXECUTABLE."
  (let* ((globs (org-roam--list-files-search-globs org-roam-file-extensions))
         (command (s-join " " `(,executable ,dir "-type f \\("
                                            ,(s-join " -o " (mapcar (lambda (glob) (concat "-name " glob)) globs)) "\\)"))))
    (org-roam--shell-command-files command)))

The full shell com­mand used for find is:

find /path/to/dir -type f \( -name "*.org" -o -name "*.org.gpg" \)

On top of the bulk of the imple­men­ta­tion above, there are a few util­i­ty func­tions as well:

(defun org-roam--list-files-search-globs (exts)
  "Given EXTS, return a list of search globs.
E.g. (\".org\") => (\"*.org\" \"*.org.gpg\")"
   (mapcar (lambda (ext) (s-wrap (concat "*." ext) "\"")) exts)
   (mapcar (lambda (ext) (s-wrap (concat "*." ext ".gpg") "\"")) exts)))

as well as:

(defun org-roam--shell-command-files (cmd)
  "Run CMD in the shell and return a list of files. If no files are found, an empty list is returned."
  (seq-filter #'s-present? (split-string (shell-command-to-string cmd) "\n")))


In the orig­i­nal PR, I did some rudi­men­ta­ry bench­marks, using Jethro’s public brain­dump.

(benchmark 1000 '(org-roam--list-files-rg "./jethrokuan/braindump"))
"Elapsed time: 9.012230s (0.399595s in 5 GCs)"

(benchmark 1000 '(org-roam--list-files-find "./jethrokuan/braindump"))
"Elapsed time: 5.543965s (0.318566s in 4 GCs)"

(benchmark 1000 '(org-roam--list-files-elisp "./jethrokuan/braindump"))
"Elapsed time: 55.781495s (3.220956s in 41 GCs)"

Since then, others have done more elab­o­rate ones, which you can read about here.

It took me around three days to get the PR to a pre­sentable state, thanks to awe­some feed­back from Jethro and prog­fo­lio. I’ve learned a lot of Elisp, and hope to con­tin­ue learn­ing more and con­tribut­ing!