An Exercise in Macrology
Ask any advocate of the lisp family of languages what the most important feature of the language is, and they will almost always say that it is homoiconicity. A homoiconic language represents data and programs in the same form. When I first heard this as a new grad I had recently read “Godel, Escher, Bach, An Eternal Golden Braid” by Douglas Hofstadter. It’s a wonderful book, which speculates that artificial intelligence could emerge from a recursive, self-referential process, and introduces lisp as a vehicle for such a process. At the time, I thought this meant that lisp enable self-modifying code, which many consider to be a dangerous practice. My career went in another direction, towards hardware design, so I didn’t really get back to the question of the application of homoicnoicity until much later.
Following lispy languages over the years as a hobby and as a scripting language for EDA tools, it became clear that homoiconicity enables powerful macro facilities which can be used to create domain specific languages (DSLs). A DSL allows very clear and concise programs to be written for the problem space at hand. A DSL is created by developing a set of macros which are transformed behind the scenes at compile time to the base language. Homoiconicity means the macro language and the base language are one and the same. The full power of the language is available to write macros.
While the consensus is that DSLs make lisp great, most examples and tutorials on lisp macros focus on simple convenience macros and how to avoid some of the pitfalls in macro development. There are few examples of DSL development which turn up in a search. To that end, I bring you a practical DSL and describe the development process which led to its final form.
The idea for the project came from a discussion on my employer’s #xrdl slack channel. The wordl word puzzle and variants such as quordle were near the height of their popularity, and the community of world-class optimization engineers in the company were soon discussing how to solve the puzzle algorithmically. I was reading Doug Hoyte’s definitive book on Common Lisp macro development, “Let Over Lambda” at the time, and saw an opportunity to write some analysis tools for wordl as a DSL.
Wordl is a simple game which was a huge hit during the pandemic and it continues to be popular. The New York Times purchased it from the original author and it’s featured as one of their daily word games. A player has 6 chances to guess the word of the day. Correct letters in the right position are colored green. Letters which are in the word but in the wrong position are colored yellow.
My starting point was a random word generator I had written for writing prompts. Unix-like OS’s almost always contain a nice dictionary for things like spelling checks. Under MacOSX it is in /usr/share/dict/words. The DSL provides a way to do a filtered search of the dictionary for 5 letter words. Here’s an example:
(wordl-filter (p #\s 0) (x #\e 1) (x #\e 2) (c #\x) (n #\j))
It’s simple and it works. In this case it returns a list of 5 letter words which start with s, don’t contain an e in the second or third position, contain an x, and don’t contain a j. It can be used to represent any of the constraints in the wordl display, but it’s verbose and repetitive. There’s also no way to specify logical expressions combining the predicates. Everything is implicitly combined with a logical and.
Here’s the code. As promised, it’s lisp code processing lisp code. Forms created with defmacro are executed before the code is evaluated, converting macros into expanded lisp code. Macros often make use of the quasi-quite (`()) operator which allows the content of a list literal to be selectively evaulated. The ‘,’ causes a single form to be evaluated, ‘,@’ splices an evaluated form into a list. ‘words-of-length’ is a function which filters the dictionary vector to include only words of a specific length, which defaults to 5. It hides the details of reading the dictionary file, which aren’t relevant here.
;; dsl for constraints
;; (p #\l n) ; position
;; (x #\l n) ; exclude position
;; (c #\l) ; contains
;; (n #\l) ; does not contain
(defmacro wordl-filter (&rest preds)
"filter 5 letter words based on a list of predicates"
(let ((forms '()))
(dolist (pred preds)
(let ((pred-type (car pred))
(pred-args (cdr pred)))
(cond ((eq pred-type 'p)
(push `(eq ,(car pred-args) (aref wut ,(cadr pred-args))) forms))
((eq pred-type 'x)
(push `(not (eq ,(car pred-args) (aref wut ,(cadr pred-args)))) forms))
((eq pred-type 'c)
(push `(find ,(car pred-args) wut) forms))
((eq pred-type 'n)
(push `(not (find ,(car pred-args) wut)) forms)))))
(push '(lower-case-p (aref wut 0)) forms) ; eliminate proper names
`(remove-if-not (lambda (wut) (and ,@forms)) (words-of-length))))
Here’s how the DSL expression above looks after macro expansion:
(remove-if-not
(lambda (wut)
(and (lower-case-p (aref wut 0)) (not (find #\j wut)) (find #\x wut)
(not (eq #\e (aref wut 2))) (not (eq #\e (aref wut 1)))
(eq #\s (aref wut 0))))
(words-of-length))
It works but I wasn’t really satisfied. There’s plenty of room for incremental improvement. Next, I added the ability to handle logical expressions which combine the predicates. I used a locally scoped recursive function to expand predicates inside an arbitrary logical expression.
(defmacro wordl-filter (&rest preds)
"Filter 5 letter words based on a list of predicates"
(labels ((process-pred (pred)
(cond ((eq (car pred) 'p)
`(eq ,(cadr pred) (aref wut ,(caddr pred))))
((eq (car pred) 'x)
`(not (eq ,(cadr pred) (aref wut ,(caddr pred)))))
((eq (car pred) 'c)
`(find ,(cadr pred) wut))
((eq (car pred) 'n)
`(not (find ,(cadr pred) wut)))
((eq (car pred) 'and)
`(and ,@(mapcar #'process-pred (cdr pred))))
((eq (car pred) 'or)
`(or ,@(mapcar #'process-pred (cdr pred))))
((eq (car pred) 'not)
`(not ,(process-pred (second pred))))
(t (error "Unknown predicate type")))))
(let ((forms (mapcar #'process-pred preds)))
(push '(lower-case-p (aref wut 0)) forms) ; eliminate proper names
`(remove-if-not (lambda (wut) (and ,@forms)) (words-of-length)))))
This addresses logical expressions, but there’s still lots of room for improvement. It doesn’t handle arbitrary lisp expressions. It is essentially manually parsing the lisp code. It also violates the DRY principle (Don’t Repeat Yourself). Lots of code is duplicated.
As Doug Hoyte points out in “Let Over Lambda”, lisp already knows how to parse lisp, so why not use lisp’s parser rather than rolling your own. This is where a very handy form call macrolet comes into play. Some people pronounce it like the GM automobile brand name, although it’s a version of let which is used to define locally scoped macros. Here’s the final version which supports arbitrary lisp expressions. It includes several improvements which were added incrementally once the macrolet structure was in place:
Characters don’t need to be specified with #\, they are converted by the macro. Expressions are easier to type and look cleaner.
Predicates can be passed multiple characters rather than needing to be repeated for each character (p 0 a b c) instead of (p #\a 0) (p #\b 0) (p #\c 0).
The c (contains) predicate now includes a combiner (any/all) to specify how to handle multiple characters. (c any a b c) or (c all a b c)
Predicate definitions reference other predicate definitions rather than repeating the code. It’s DRYer.
An e (exists) predicate was added. It’s used to check whether a word exists in the dictionary.
Glob style pattern matching was added. Five letter strings with ‘*’ wildcards can be used to get a list of matches.
It adds the s predicate which returns a list of words which contain one or more of a list of substrings. (s “ed” “ing”)
The file is read directly and results are sorted to list words with higher frequency letters first. It uses the sum of the first letter, last letter and overall dictionary letter counts for each word.
(defmacro wordl-filter (&rest filter-expr)
"filter dictionary words based on logical predicate expressions"
`(macrolet ((s2c (sym) `(char-downcase (aref (string ',sym) 0)))
(e (word) `(string= ,word wut))
(c (combiner &rest chars)
(let ((tests nil))
(dolist (char chars) (push `(find (s2c ,char) wut) tests))
`(,(cond ((eq combiner 'any) 'or)
((eq combiner 'all) 'and))
,@tests)))
(n (&rest chars) `(not (c any ,@chars)))
(p (pos &rest chars)
(let ((tests nil))
(dolist (char chars) (push `(eql (s2c ,char) (aref wut ,pos)) tests))
`(or ,@tests)))
(x (pos &rest chars) `(not (p ,pos ,@chars)))
(s (&rest strings)
(let ((tests nil))
(dolist (str strings) (push `(search ,str wut) tests))
`(or ,@tests)))
(g (glob-word)
(assert (= 5 (length glob-word)) (glob-word)
"glob must be length 5: ~S" glob-word)
(let ((tests nil))
(dotimes (pos 5)
(let ((char (aref glob-word pos)))
(unless (char-equal #\* char) (push `(p ,pos ,char) tests))))
`(and ,@tests))))
(word-letter-freq-sort (remove-if-not (lambda (wut)
(and (= 5 (length wut))
(lower-case-p (aref wut 0))
,@filter-expr))
(read-word-file "/usr/share/dict/words")))))