CLClojure: An Experiment Port of Clojure to Common Lisp
Background
Porting Clojure seems to be the thing to do these days. After the clojurescript compiler came out, people had a really good blueprint for hosting clojure in other languages. The fact that the clojurescript compiler is defined, largely, in terms of protocols, makes it a bit closer to the āclojure in clojureā goal. As a result, you should be able to bootstrap the language (as they did with JavaScript) with a minimum of primitives. Most of the language is defined in Clojure, so voila.
We currently have Clojure ports targetting Scheme, C, C-via-Scheme, Python, etc. Iāve found Common Lisp to be a particularly slick environment with some cool tools and a fairly active community. I think it would be interesting to port the ideas from Clojure to Common Lisp, ultimately implementing a compliant Clojure in Common Lisp.
Layout
The current WIP is broken into several Common Lisp files (only tested on SBCL). You can look at the dependencies in clclojure.asd. The general layout and organization follows (subject to change):
common-utils
Contains a plethora of utility functions and convenient clojure
analogues (or direct ports) useful for porting clojure source. Primary features
include multi-arity lambdas via lambda*
and multi-arity functions via defun*
.
Also provides a with-recur
form that allows for a similar idiom for Clojureās
loop/recur
. This is typically pulled in by all other packages for general
utility. Some limited sequence functionality overlaps with sequences.lisp, and may
be replaced in the future.
sequences
Provides an implementation of clojureās lazy sequence abstraction. Implemented in CLOS, and porting the seq protocols. Provides a subset of the clojure core sequence library for CL used in bootstrapping and metaprogramming. The intent is not to have a full port, but a minimally useful subset. Integrates the seq abstraction with CLOS sequence types.
pvector
A port of the java implementation of clojureās persistent vector
based on Hash Array Mapped Tries (HAMTs). Serves as a stand-alone
datastructure with local functions. Also defines subvec
which
is the O(1) slice of a persistent vector.
cowmap
A simple persistent map implemented by copy-on-write operations over a standard hash table. This is a minimal implementation to support bootstrapping, and not intended for performance or production use.
pmap (wip)
A port of the java implementation of clojureās persistent hashmap. Incomplete.
protocols
Defines forms for defprotocol
and deftype
in common lisp.
Protocols are implemented as bundles of generic functions and args,
stored in a CLOS struct. Protocols are registered, and define
satisfies?
predicates for the protocol to ensure valid
implementations of reify
deftype
extend-type
and
extend-protocol
. Protocol definitions expect vector args (rewrite
pending to extend these to sequences to allow simple cl-based protocol
definitions).
reader
Provides a consolidated place for reader macro support. Primary purpose is to enable vectors, maps, and sets for bootstrapping. Original experiments also supported characters. Intentionally hijacks the read-table. Needs to be ported to named-readtables, or excluded entirely once bootstrapping of clojure.tools.reader is done.
eval
Due to the custom evaluation rules for clojureās data literals, simple
reader macros are not enough to get literal support. This package
defines a generic function, custom-eval
, and an API for
installing/uninstalling custom evaluation support in SBCL. The
implementation swaps the function definition for
SB-IMPL::simple-eval-in-lexenv
and adds a final condition to the
non-list eval rules: if custom-eval
is supported, then the generic
function is invoked. This allows user-defined extension of the
evaluation semantics in a minimal way, and retains low-level support
(e.g. from the compiler).
literals
Consolidated place to extend custom evaluation rules for vectors, maps, and other literals.
keywordfunc
Initial hack at implementing keywords-as-functions support in CL.
This currently works only in lexical forms, and is not a global modification.
Exposes the macro with-key-fn
which will scan the body for keywords in function
position, and build a table of keyword functions and fbind. The more commonly used
feature is the keyaccess
type, which is used in the lexical
package.
lexical
Implements a lexical environment ala let*
called unified-let*
that
will unify lisp-2 bindings into a lisp-1 lexical environment for the body.
This is the foundational method for implementing clojure-style let
.
bootstrap
Implements a core set of clojure special forms and brings along the protocol definitions
from cljs.core
. Eventually, the entire cljs.core
library will be pulled in here
and evaluated. This should hopefully provide a sufficiently robust environment
to then bootstrap the rest of clojure via tools.reader and tools.analyzer, leveraging
extant CL implementations from the earlier bootstrapping process where it makes sense.
symbols (wip)
Nascent thought piece on implement clojure-specific vars and namespaces. Clojureās qualified symbols and keywords are a tad different and require some thought towards integration with CL.
Goals
Bridge the gap between the cool stuff in clojure and Common Lisp.
Implement really useful bits of clojure in portable common lisp, and provide them as stand-alone libraries.
This includes lazy sequences, the generic sequence abstraction, and the fundamental persistent hash-array-mapped-trie data structures in clojure:
- persistent vectors
- persistent maps
- persistent sets.
Extend the generic sequence abstraction and other idioms to Common Lispās built-in mutable structures.
Common Lisp already has a sequence library, but I think Clojureās is more general and can be trivially extended to new types. Common Lispās irrational locking down of SEQUENCE is a hurdle here. The HYPERSPEC will never be updated in my lifetime :) So generic functions are the current way to bridge this stuff.
- Protocols are really nice, as are Clojureās arbitrary dispatch multimethods.
- Data literals are also highly useful. I think Clojure nailed the choice of literals, so providing reader macros for these guys would be very nice.
Possibly wrap a Common Lisp STM implementation, or cheat and use something like lparallel or just delegate to clojure.core.async (like clojurescript).
Bootstrap a fully functional Clojure onto Common Lisp.
Learn.
So far, this project continues to be a great way to learn about both CL and Clojure, to include implementation hurdles, support for PLT stuff, long-standing warts from decisions made ~25 years or more ago, and work-arounds. Nothing is insurmountable so far, which is pretty cool.
Status
Began work porting Clojureās persistent data structures from Java about years ago, while
simultaneously learning Common Lisp (and by necessity Java :/ ).
- Got a working persistent vector implementation, with compatible clojure literals about a year ago.
- Started working on persistent hash maps around November 2012.
- Built a temporary implementation of clojure-style protocols in Common Lisp ~ Dec 2012.
- Pulled the bits into an actual ASDF system and pushed everything to the Github August 2013.
- Implemented a baby clojure evaluator that __should__ bridge the lisp1/lisp2 gap between clojure and the Common Lisp host. Unsure if this is going to work out in the long term, but eh.
- Itās real trivial at the moment.
- Working on library code in my spare time, still need to port the persistent map.
Revisited 2017
- implemented some rudimentary stuff
- vector reader macros not fleshed out; work fine at the REPL, but failed to return a vector (instead returning a form to create the vector). Trivial but important oversight.
- Still hijacking the global read-table. Pulled in named-readtables to help, but havenāt integrated.
- Working on variadic functions, explored funcallable classes, refined protocols, deftype.
- cleaned up the build process (more work to be done here)
Revisiting 2018
- got reader macros matured (vector literals produced properly now),
- got protocol definitions and implementations respecting vectors, albeit in a hacky way. We still allow lists for argsā¦.
- working on deftype, then getting the bootstrap from CLJS (core protocols and functions) to compile.
- Working on leveraging named-readtables for better delineation of clojure source, to include file-level (somewhat racket like).
- Also intend to leverage named-readtable to get case-senstive reader (via :modern), and to enable CL interop trivially with a macro allows CL reader inside body (vice versa, for inline clojure if it makes sense).
- refining the approach after reading a lot, looking at some other sources of inspiration (including early proto-clojure->java compiler from Hickey in CL)
- Basic def, fn works. Protocols work. Metadata works mostly. Deftype
- got let, albeit without destructuring. Still useful for bootstrapping!
- Initial implementation of reify working, wrapped deftype in a version compatible with cl:deftype
Revisiting 2019
Reconciled some problems with data literals, clojureās evaluation model, and CLās model.
The issue is tricky and - even after using Clojure for 8+ years - something I completely missed.
So folks laud CL having awesome reader macros, and talking about how they allow you to implement any language etc. Banking on these comments, I went about doing that (Clojure). As it turns outā¦.IF you want to define a language with new data literals , and those literals have their OWN evaluation semantics, reader macros cannot help you (they help to a point).
- I think racket probably better supports this based on research, but I have no proof.
When we read a vector in clojure (not a syntax-quoted one), we actually get a literal vector data structure (not a a list of `(vector arg1 arg2 arg3), but an actual vector of [(eval arg1) (eval arg2) ā¦]).
I chased my tail on this twice now, and finally realized that in clojure, (eval [x y]) -> [(eval āx) (eval āy)].
So when you read a vector, and eval it, you actually have to apply those evaluation semantics to build up the resulting vector. In CL, eval just passes the vector through.
This poses a problem in CL, since yes, you can trivially extend the reader to handle vectors, but you CANNOT get the evaluation semantics to line up, since CL provides no mechanism to do so (unlike say reader macros). You can of course define your whole infrastructure, including your own eval, etc., but Iām interested in minimally bootstrapping a functional enough clojure to get the language tooling ported, so hacking on running lisp implementation of `eval` is attractive.
I tried having the reader macro return `(vector ,@args)
instead of
building the literal vector directly ala (eval `(apply #'vector
,@args))
, which worked (mostly!), except that for things like macros,
instead of dealing with a vector for say bindings, youād now be
dealing with the syntax-quoted list of (vector arg1 arg2 ....)
.
I though about hacking all the macros to recognize this, and coerce the vector sexpr into an actual vector, but itās like squeezing air in a balloon.
Rather, simply extending eval sb-impl:simple-eval-in-lexenv
to
understand that there are other data literals (as expressed by say a
generic function) that have their own evaluation semantics as forms,
allows one to get up to par with using clojure literals e.g. in
macros.
It should be minor surgery (and actually portable), but since the goal is to bootstrap something that can read the source for clojure.tool.reader and clojure.tools.analyzer, etc. and emit a pure common-lisp implementation, I āthinkā itād be a one-time deal.
- Currently implemented in eval.lisp, which provides clclojure.eval and an
API for allowing custom evaluation semantics defined by the
generic function
custom-eval
. - Also implemented a companion package, clclojure.literals, which enables custom evaluation, and extends the semantics for the boostrapped persistent structures.
Quasiquoting Data Literals
So we have evaluation semantics consistent with Clojure for vectors, maps (and trivially sets when they arrive).
The backtick implementation in sbclās backq.lisp is currently
befuddling due to lack of familiarity. Based on the existing
implementation in readers.lisp, I get unquote
working fine with
user-defined literals like vectors and maps, but unquote-splicing is
throwing. Trying to hack around it from the reader level, e.g. coerce
to a normal quasiquoted sexpr then eval that, rather than hacking
backq.lisp as I did clclojure.eval. Lack of familiarity is stunting
progress a bit.
I had to dig into the innards of their eval and backquote implementations to jank some stuff together, but it works. Itās life Jim, but not as we know itā¦. I literally got done hacking the quasiquote stuff in reader.lisp, clclojure.reader:quasify to make things work, and had no idea how I got it to work since my previous intuitions had failed. SBCL uses a ācommaā struct to indicate splicing operands inside a quasiquote, and thereās some weird interplay with macroexpand. So I kludged stuff together through trial and error, to build an arglist to shove at my vector and map constructor.
The other hack is wiring in a generic method into sbclās default eval implementation,
This admits defining custom evaluation rules for new data literals, like vectors and maps.
Surprisingly, SBCL has some well documented/written source. I learned a bit plumbing through it, although their compiler stuff (and IR-1 intermediate language) is still magic to me. Thankfully, it appears to hacking the simple-eval-in-lexenv function is enough to wire into the compiler (Iām guessing the compiler form leverages it downstream, but Iām not sophisticated enough yet to see how it all weaves together).
Once the CL native reader and analyzer are in place, then the normal eval semantics would take hold (unless thereās a desire to maintain the bootstrapping methodology and allow clojure literals like persistent vectors, maps, and sets to have eval semantics for interop or something, I dunno).
Refactoring
There are a lot of ālearningā debris throughout the code base that are not on the critical path for bootstrapping (or no longer are). These need to be pruned and moved elsewhere so that a consistent, clear path of required code is attentuated.
Sequences and Hacks
So I decided (for ease of metaprogramming my bootstrap), to clean up the lame sequence implementation I had from years ago and write a proper one. This went very well.
I even got up to the point where Iād implemented idiomatic lazy sequences, integrated with CLOS, and started getting reduce working. Then I ran into a snag:
I wanted to use CLās built-in reduce for its sequence types, since it was already efficient. Clojure has the notion of a simple protocol, InternalReduce, which basically says āhey, reduce yourself!ā or else it falls back to (likely slower) seq-based first/rest reduction. So this is desirable (plus nice interop points).
I got everything working, but one thing was wrong: Clojure implements a short-circuiting variant of reduce, via invoking (reduced the-result). This creates a little wrapped type that can be quickly checked during the reduction loop to see if itās a reduced item, which signals to stop reducing. The reduced item is then unpacked (via deref) and returned as the result.
Problem: the CL folks didnāt think of this decades ago. So how to bootstrap this functionality on extant concrete?
I almost gave up, then remembered vaunted warnings from Java land
about idiots using Exceptions for control flow. This built on the
idea that I could just throw an error if I detected a reduced value,
and handle the reduction gracefully outside of the built-in cl:reduce.
My ugly hack (although using the elegant tools of signals and
handler-case) emerged:
We just leverage the excellent low-level construct block
to define
an escapable context we can return-from
. From inside this block,
we wrap the call to reduce with a lambda that - upon detecting
a reduced item - invokes return-from
with the dereferenced value of
the reduced item.
As another fun time, I also ran into problems with CLOS and multiple dispatch that were a little unsavory. Had to learn about call-next-method. Realized that CLOS doesnāt have a method preference hierarchy by default (clojure does), and that these kinds of problems create interesting lisp OOP solutionsā¦Also managed to crash SBCL due to it several times (I was trying to leverage SBCLās extensible sequences, where unlike other lisps, you can inherit from the sequence base class, which I wanted my lazyseq classes to do). This ended up failing since going the generic method route, I wrote a specializer on āsequence, which overode the other specializer on ālazyseq, since ālazyseq inherited from āsequence. Despite getting some of it worked out, I still managed to crash SBCL several times (stochasticā¦), and reverted to NOT having ālazyseq extend āsequence for now.
Loop / Recur (currently with-recur)
Basis of loop/recur. Combined with lambda* and defun*, this will allow clojure style recur forms in function bodies (with a trivial extension to loop / recur). Iām not checking for tail position yetā¦.probably need to do that to ensure correctness. Might have to code walk.
This took many tries (dissecting dolist, some simple loop forms), as well as trying and failing to implement a (more elegant) version with macrolet and another that used a simple labels form for recur (labels blew the stackā¦. think tco only happens there if compiled).
CL has some nice lower level flow control primitives like block and tagbody.
- āBTW, does your code work on non-tail `recur`s? (Is there even such a thing? idk.) Example:ā
(with-recur (x 2)
(if (< x 4)
(progn
(recur (1+ x))
(format t "back from recur~%"))
x))
I think, based on the semantics of āgoā, it will, since itās effectively a goto that jumps to the tag in the tagbody.
However, Iām currently working on tailcall detection to enforce clojureās semantics.
The new version is now like clojureās
(loop [x 2]
(if (< x 10)
(recur (inc x))
x))
with a sequence of bindings denoting [arg1 init, arg2, init] ā¦.
(with-recur (x 2)
(if (< x 10)
(recur (1+ x))
x))
Started off optimizing to see if I could avoid emitting with-recur for functions that donāt invoke it, so code-walk and look for recur forms.. Then realized I needed to detect tail callsā¦so started going down that rabbit hole..
Curious to see if thereās a simple way to walk the code instead of denoting the grammar of tail calls. Like something empirical to mechanically demonstrate the evaluation is independent of further calls. Say, build a DAG representing the call graph, demonstrating that recur only ever occurs at leaves.
My cro-mag approach it to just define special cases for each form and check that way. Tricky stuffā¦
Tail Call Detection / Enforcement
Defined some rules (currently for CL, but clclojure should work on top of these with little to no changes) that follow the tail call semantics for special forms.
In common-utils
, these functions inform with-recur
about the viability of
the code at compiletime and throw errors if invalid tail calls are detected:
detect-recur
Simple naive code walker that walks an expression looking for(recur ...)
tail-children
Defines the (currently static) semantics of different special forms detailing whether they have children and what position (:tail or :non-tail). Returns a list of pairs of (:tail|:non-tail child-expr)categorize-tails
Walks an expression, accumulating a list of call-site structs, which encode what kind of call was made (:recur, :illegal-recur) and what the expr was for use in reporting errors.summary-tails
Reduces the output ofcategorize-tails
into a pair of (t|nil illegals), whereillegals
is a list of illegal call-sits, andt|nil
reports whether āanyā instances ofrecur
calls were detected.
with-recur
then leverages summary-tails
to determine if there are errors in the
input at compile time, and signals an illegal tail call if so. If there arenāt
errors, but there are not instances of recur
calls in the expression, with-recur
just optimizes down to a semantically-equivalent let*
, so it can be invoked regularly
with no added codegen (e.g. in lambda*
or other recur targets) if recur never shows up.
Otherwise, a form with a local function recur
is emitted and things perform as expected,
with proper tail calls enforced.
Symbol Interning and Mucking With MetaProgramming
Discovered, through use of case
to match on symbols for
dissecting lists during code walking, that my functions wouldnāt work
outside of the package they were originally defined it. Subsequently,
macros depending on said functions, (with-recur
being the foremost example)
failed to behave as expected. It turns out, this has to do with how
CL interns apparently unqualified symbols in a package.
Say I implement a function to examine the first element of a list and dispatch depending on the symbol detected there:
(defun detect-foo-bar (expr)
(case (first expr)
(foo :foo)
(bar :bar)
(t nil)))
This function will work great so long as Iām inside of the package where I defined it.
The case
test for each clause uses common-lisp:eql
, so the notion of symbol
equality applies to the args. The side-effect introduced here is that
the symbols āfoo, ābar, are actuallyā¦.ācurrent-package:foo and ācurrent-package:bar
rather than being free-floating, unqualified symbols. If weāre at the repl, they
even print as such, without any indication that theyāre qualified. This is standard behavior.
The tricky part is that if we then export detect-foo-bar
and try to use it from other-package
,
CL-USER> (original-package:detect-foo-bar '(foo 2 3 4))
NIL
we get no joy, despite the input expression āobviouslyā having the symbol āfoo in it. To compound matters,
the original function we dutifully tested at the REPL works fine in original-package
.
The reason is that āfoo in original-package
where we defined the function, and where we
macroexpanded case
to implement it, is actually āoriginal-package::foo
, and the symbol
weāre trying to match against coming from the list read and evaluated in CL-USER
is actually
āCL-USER::FOO
:
CL-USER> (symbol-package 'foo)
#<PACKAGE "COMMON-LISP-USER">
Under these conditions, we realize the symbols are not in the same package, thus not eql
:
CL-USER> (eql 'foo 'original-package:foo)
NIL
Clojure doesnāt have this problem due to its semantics for unqualified symbols across namespaces:
Clojure 1.10.1
user=> (def foo 'foo)
#'user/foo
user=> (ns blah)
nil
blah=> (def foo 'foo)
#'blah/foo
blah=> (= foo user/foo)
true
So CLās behavior was a bit of a surprise, particularly in how it seemingly janks up defining utility functions that work on symbols and dissect lists, e.g. for metaprogramming.
Implementing (and then testing from another package) with-recur
and its auxillary functions
was the first notable time this popped up. The fact that it can fail silently (in the case
of case
) engenders additional caution (absent a better solution that Iām unaware of).
So, my current solution is to define a new equality test based on unqualified symbols, and
their symbol-name
equality. In common-utils
:
symbol?
determines if the input is strictly a symbol, and not keywordsymbol=
determines if both inputs aresymbol?
, whether thesymbol-name
s arestring=
seql
wraps the typicalcommon-lisp:eql
with a custom path for inputs that aresymbol?
custom-case
Allows definingcommon-lisp:case
macros with a custom user-defined test.scase
Implementscase
usingseql
viscustom-case
and is a drop-in replacement for
existing uses of case
that relied unintentionally on interned symbols for case clauses.
With these in place, and extant code for with-recur
and friends ported to use scase
,
we now have a working implementation of clojure-style loop/recur.
Variadic Protocols
We now appear to have functioning variadic protocols, as well as marker protocols (protocols with no methods).
These work via a simple dispatch mechanism, via common-utils:lambda*.
The generic function is somewhat opaque in the args, only taking (obj &rest args) but the dispatch mechansim conforms to the protocol definition, allowing multiple dispatches per clojureās definition.
Behind the scenes, we have a simple lambda* thatās created for the implementation during defmethod time, which actually carries out the implementation.
It is invoked inside the body of the defmethod specializer, and simplied applid to the obj and args, allowing the lambda* dispatch mechanism to take over.
This is good for bootstrapping, but the lack of integration with slime and e.g. arg detection (ala CIDER) is not desirable long-term.
For dev purposes, itās meaningless though since weāre bootstrapping pretty simply. I donāt plan to hack a custom slime mode to enable documenting multiple- arity functions just for the boostrap.
This opens most of the remaining doors for our basic boostrap. We can now leverage most of cljs.core to get the basic libraries lifted in with little effort (given the current state of reader, seqs, etc.).
It should also be possible to work backwards from the protocol implementations to get us into an even simpler bootstrapping environment. For the time being, weāll continue wrapping existing CL functionality though.
Generic Symbol Equality (#āseql)
One nagging feature that probably needs to be addressed is the necessity of using common-utils:seql for symbol equality testing due to CLās somewhat baroque equality and the impediment it causes toward simple metaprogramming.
This is prolific: even hash tables arenāt trivially extended with a custom equality test out of the box, although thatās not a vast hurdle.
Lexical Literals
So, everything worked fine when evaluating literals from the null
lexical environment based on the custom-eval semantics in
clclojure.eval:custom-eval and related wiring into sbclās eval
.
This did not carry over into expressions with lexically bound forms that were also data literals. Results kept returning wierd quoted forms for values rather than the actual evaluated forms.
It ended up being a long and tricky process to tease out the underlying cause, and to diagnose a quickish fix.
As it turns out, CL is pretty canny about how it handles lexical
scope. To satisfy the compiler, weād like to pass off friendly,
common forms that SBCL expects. Unfortunately, if literals show
up in various places where lexical scope is defined (such as the
right-hand side bindings for let*, which is emitted by
clclojure.base:let
), we end up with problems.
We also end up with problems if the literal is the result (or any piece of) the body of expressions that extend lexical scope.
We could theoretically fix all this at a somewhat high level, with a code-walker that recognizes our clj forms and does the requisite transforms at macroexpansion time. This is cool, but not necessarily what we want for helpful user feedback, e.g. macroexpanding a form expecting to see clojure literals.
Thankfully, we have access to the evaluator, and can continue to
slightly tweak it to suit our needs. Namely, where lexical forms are
defined, e.g. common-lisp:let*
and common-lisp:let
, we can add in
a mechanical transform to the expression (not unlike manual macro)
that walks the expression and expands our literals into
literal-constructor forms that are readily recognized by sbclās
evaluation machinery (and the compiler).
So, the technique is to define another generic function that userās
can express lexical expressions for, clclojure.eval:let-expr
. The
semantics would be:
(def y 3)
(let [x [2 y]] [x]) => [[2 3]]
In CL
(let* ((x [2 y]))
[x])
is compiled to
(let* ((x (persistent-vector 2 y)))
(persistent-vector x))
Importantly, we donāt want to muck with the literals at macroexpansion time (e.g. for debugging an visualizing the code), but we do want to intercept them prior to going to evaluation. This sounds very much like a possible compiler macro (or other eval-when context), and may be trivially refactored into one.
Absent that, we define a function
clclojure.eval:custom-eval-bindings
, which acts as an intercessor,
and takes an expression to walk, and a lexical environment to pass
along in case we ever get to use it (this conforms to the function
signature for sbclās sb-impl::%simple-eval
.
We detect various types of expressions that could define literals
in the lexical scope, and walk the expression based on the formās
semantics, recursively, replacing any literals with the
appropriate constructor- based canonically evaluable bindings -
iff the value supports clclojure.eval:let-expr?
.
This gets us 90% of the way to happy, but there still exists a problem with (fn ) forms, since they actually build up lambdas.
Funny enough, CL doesnāt like to just emit (lambda (x) ā¦) from a defmacro expression.
Neigh, if we quasiquote it, we get #ā(lambda (x) ā¦) which is effectively (function (lambda (x) ā¦))
Since weāre emitting these forms, they end up being opaque when we
look at custom-eval-bindings. So, we have to process them at
macroexpansion time , e.g. in the (fn ā¦) macro. We do so by
macroexpanding the body of the lambda, and similarly walking the
form with custom-eval-bindings
. Then, we emit the ultimately
sharp-quoted lambda with the new body. This works well, with the
small tradeoff that fully macroexpanding a (fn ..) macro will
expand into the more verbose form without any literals.
Multimethods (WIP)
We still need to implement multimethods but thatāll be a simple variation on the dispatch stuff. These show up in both the reader and analyzer, so theyāre next.
Loose protocol implementation
Right now, defprotocol expects implementations for all function arities. I think Clojure allows partial implementation, or generates stubs via abstract methods. We need to implement this relaxation, e.g. for IFn, so we can get boostrapping advanced.
Revisiting Lexical Literals
So the original implementation for this fixed one thing only to break another.
In an attempt to comport with the SBCL compilerās pre-baked notion (read ignorance)
of the evaluation rules for non-list things, I went down the route of defining
a custom code walker to intercept calls to let
and friends. This interception
would walk the code, and invoke generic functions to coerce custom literals (like
vectors) to eval-friendly forms (read, a constructor and args) so I could avoid hacking the compiler.
This worked great and fixed a major problem, where lexical literals were effectively just unevalād and quoted directly. Now, at eval time, we get custom literals working as in Clojure.
Downside
I broke the expectation that we would be using āactualā persistent vectors as syntactic
objects in macros (namely defprotocol
, fn
, defn
). This primarily manifested as
an ugly regression in extend-protocol
, where CL would complain that weāre trying to
apply a generic function from the pvector namespace to something non-pvector.
The short of it is that, depending on the order of eval and macroexpansion, and if we have a (let ā¦) binding anywhere, we stand a good chance of coercing our actual [vector] into a (persistent-vector vector) for eval sake, thus breaking the expectation that literals will be preserved both for eval and code.
Fix?
To get this working, I went down a couple of paths, including looking at hacking into the compiler. Then I realized, if I can just tag the transformed literals, I can identify them and invert the process inside of macros. Youād end up with a ping-ponging of the custom eval bindings walker munging data literals into evalable forms, then macros - prior to expansion - traversing their args and coercing the forms back into actual vectors, maps, and sets, etc. Design wise itās ugly, but performance wise should be neglibleā¦.
So the eventual solution involves using the code walker from SBCL and a couple of
generic functions. We now - by convention - mark our literals with the
identity function literal
, which just wraps the aforementioned constructor.
We then define a function recover-literals
that can walk a form and detect
invocations of literal
, replacing them with the result of calling symbolic
on the subform, which evals the arg for literal
and gives us back our
custom literal.
recover-literals
serves as an auxillary function used in the macro-defining-macro
defmacro/literal-walker
, which will quietly bind the args for the macro definition
to a lexical binding where the args (inputs) are bound to their recover-literals
results. This basically ensures that, when using defmacro/literal-walker
,
we always have access to data literals. Simultaneously, our literals now also
work seamlessly in eval
with custom evaluation semantics, including in lexical
bindings.
So far, all tests in examples are passing.
Revisiting Quasiquoting
The method I hacked together for quasiquoting does not work as intended with lexical
values in quoted literals. I did some naive variant thatās bone-headed in hindsight,
where I tried to eval stuff at read-time to build an actual non-sexp literal datastructure.
Turns out, this is not what Clojure does. Clojure does depart from CLās quasiquotation
(via syntax-quote
), in that for quoted literals, it emits the actual sexpr construction
of the literal. This is similar to my solution for working around lexically scoped
literals. The only challenge in SBCL is that the `
reader macro defaults to emitting
a quasiquote
form. So if we leverage the built-in stuff, we get the quoted form
of the sexpr vector constructors expanded from the original quasiquoted literal.
The solution is to hack the `
macro and rebind it to something thatās smart enough
to detect that weāre actually quoting a literal. From there, we should just emit the
expanded quasiquote result without the quasiquote
form preceding it. Or otherwise,
expand the quoted result.
Initial testing with persistent vectors is promisingā¦almost there.
Nested Quasiquoting mostly works
Signficant overhaul to the old (wrong) qq infrastructure. Eliminated all readtime eval stuff, we now emit sexprs consistent with Clojureās implementation (slightly different in form though).
We now have a cleaner quasiquotation model thatās extensible as well.
Generalized usage of sequence lib
Ensured the sequences lib is portable enough to be used, sicne it shows up quite a bit in clojure.core bootstrapping (e.g. macrology). Need to define additional core functions that are seq friendly or coercing.
Destructuring
Destructuring is pretty fundamental and shows up in a lot of places.
So common forms like loop
, for
, let
, fn
, defmacro
and friends
all leverage it. We should be able to fairly straightforwardly port it
from clojure.core.
Revisited Loop/Recur and variadic version
During porting of more core library functions over, I exposed bugs in the extant labels/macrolet based version of function implementation.
Did some experimenting and switched to the earlier loop-focused
with-recur
macro. Did some more testing and experimentation,
and switch over to inlining arity bodies for multiple-arity
function definitions. So named functions work, recur works,
loop
(without destructuring) is implemented (over loop*
).
Our functions now work fine with lazy-seq
where before
they were creating infinite loops and later not
recognizing function symbols for self recur.
ex-info
En route to destructure
, I implemented exception handling
via ex-info
and friends. try-catch-finally
is a macro
away.
TCO detecting false positives
Need to revisit the tail cail detection code that compiles
with-recur
invocations, since we seem to be mistaking
simple with-let
and if-let
forms for non-tail
calls for some reason. For now, Iāve got warnings
rendering, where in reality we want an error to signal (prior
behavior).
Rudimentary Parity
Weāre approaching basic functionality where I can just copy/paste source from clojure.core and have it work without issue. It turns out, this is a great way to explore the range of language features from the ground up and implement as needed. It might be possible to demo some advent of code solutions in clojure in common lisp before too longā¦
try
Defined try-catch-finally semantics from clojure.core/try in common-utils:try, which
clclojure.base exports. Trivial to implement using handler-case
and restart-case
.
~ and ~@
Realized that sb-impl::comma-charmacro
handles this and splicing (@ form), so all
you have to do is add another reader macro binding for ~
using the same as =,=.
cond
Implemented clojureās cleaner version of cond
. Found a subtle error regarding
the definition and use of cons
. Since we shadow common-lisp:cons
, we ended up
with an infinite pingpong of the protocol extension of -conj
for cons
types.
Fix was to ensure we use common-lisp:cons
for all necessary non-seq cons cell
methods.
Hurdles
A couple of big hurdles:
Lisp1 vs Lisp2.
- rely on macros (like def) to unify symbol and function namespaces,
leveraging CLās environment.
- This is mostly implemented via
unified-let*
in lexical.lisp, which is then used inclclojure.base:defn
and elsewhere. - The only thing missing is arbitrary support for keyword
access based on function application, which I have working
prototype of, but itās not valid outside of clojure macros.
- Better solution is to address this as a compiler pass in the analyzer.
- This is mostly implemented via
- The longer route would be writing a custom eval, compiler, etc. Doesnāt look necessary at the moment.
Lexical Scope vs. Top Level
So, Common Lisp as a lisp-2 has multiple namespaces, foremost of which are function and symbol. We already have the top-level (that is, null lexical environment) symbol and function namespaces unified by using setf for symbol-function to make it identical to symbol valueā¦.butā¦..
- Lexical environments donāt work that way!
- symbol-function and symbol-value only work on non-lexical symbols.
- Initial hack was to unify the namespaces by traversing
the vars in the let bindings and unifying prior to continued binding.
- Had a false-positive success since the symbol modifications were actually pointing to non-lexical scoped symbols (stuff from prior REPL interactions).
- The fix for this is to use a combination of let* and labels, which CAN
unify the symbol-value and symbol-function namespaces for lexical vars..
- Defined a macro, unified-let*, that does this for us:
- We parse the bindings, detecting if any symbol points to a literal keyword.
- We ensure keyaccess objects are compiled during macroexpansion,
which creates the plumbing for keyword access if it didnāt
already exist
- and we create function definitions for the vars that point to keywords, where the function bindings invoke the funcallable keyword accessor directly.
- We need to apply this as an implicit body for fn forms as well..
- Defined a macro, unified-let*, that does this for us:
Persistent structures.
- If we get defprotocol, deftype, etc. up and running, these are already defined in CLJS core.
- For bootstrapping, using original port of Persistent Vector
from JVM clojure, also creating a dumb wrapper on top.
- Need to add meta field to struct, also atomic locks at some point (unless cljs provides thisā¦.)
Protocols.
- Already implemented as generic functions.
- Explore alternative implementations if generic functions arenāt speedy enough (doubtful, but havenāt profiled anything).
- protocol definitions need to be namespace/package qualified.
- Looks like they are already, at the package level.
- Need better error messaging on missing protocols
- TODO: get-protocol should signal.
- Namespace qualified symbols needed.
- Currently thereās a potential for collisions, since the protocol registry (a hashtable) doesnāt assoc the protocol name with the package name it was defined it.
- We can approximate qualified symbols using the dot syntax, even though CL wonāt resolve those symbols to a clojure namespace directly.
- Get variadic protocol implementation verified.
Deftype.
- shadows CL deftype.
- current implementation follows defprotcol, appears to work for non-varidiac protocol impls.
- Look at walking method impl bodies to eliminated unnecesary slot bindings (currently we generally bind all non-shadowed slots).
Multimethods.
- Initial ideas for multiple dispatch following clojureās implementation.
Metadata
- Symbols and collections (anything that supports IMeta) can have a persistent map of metadata in clojure.
- Used to communicate a lot of information, including type hinting, source code, documentation, etc.
- Current expectation is to leverage property lists for symbols, and unify with generic functions behind a protocol.
Namespaces
- CL has packages. Initial hack would be to leverage packages as an anlogue to ns.
- Longer term solution is implement own ns system via objects.
- Rather leverage CL where possible.
- Currently implementation of def exports by default.
- Looking at introspection possibilites for more easily tagging meta, arglists. custom function objects (experimental) are looking good for this.
- Should we maintain a parallel collection of
namespaces?
- Based on def and derived forms, we ought to be able to hook in and register stuff.
- Allows the reflection and introspection we care about / use in clojure.
- Need to translate between require, refer, import (clojure) and import (cl).
Reader.
CL macros use , and ,@ in place of ~ and ~@ in Clojure.
Weāll need to either cook the common lisp reader, or build a separate clojure reader that will perform the appropriate replacements.
- Looks like a simple tweak to the ` reader macro will suffice, weāll just flip the symbols as appropriate.
- TODO: quasitquoting in clojure (let []) inside of macros is
not splicing, need to revisit quoted-children.
ex `[,āa] => [,āA] (incorrect)- This now appears to be working, via
clclojure.readers:quasify
. - Should be compatible if we switch reader macros for , and ,@ around too.
@ is a literal for #āderef in clojure, comma is whitespace in clojure.
- Similar, weāll flip the read-table.
[] denote vectors
- already have a reader macro in reader.lisp
- vectors read correctly and obey clojure eval semantics.
- +Thereās an incorrect read going on, [x 2] will
currently read when it should throw since x
is not defined.+
TODO: revisite quoted-children for vectors and the reader fn bracket-reader.If weāre not reading a quoted vec, we need to eval args to persistent-vector ctor.- Quasiquoting works except for splicing.
{} denote maps
- already have a reader macro in readers.lisp.
- Janky but useful bootstrapping implementation in
cowmap.lisp, which is a copy-on-write wrapper around
a CL hashtable.
- This should be enough to bootstrap, where clojure.core
actually defines a persistent map implementation for us.
- We can always opt for an optimized implementation if it makes sense to go for a built-in.
- This should be enough to bootstrap, where clojure.core
actually defines a persistent map implementation for us.
- Similar problems with quasiquoting.
\#{} denote sets
- Easy enough to add a dispatch in reader.lisp
- Trivial implementation based on cowmap, pending.
^ denotes metadata for the reader
- Trivial to implement as a reader macro, BUT, we need to get
metadata working generally.
- with-meta currently mutates symbolās plists. Ideally weād need
a corresponding
alter-var-root!
to do that. - need to implement metadata slots for other clojure structures (data literals, persistent lists).
- with-meta currently mutates symbolās plists. Ideally weād need
a corresponding
\c vs. #\c for chars
- Added initial support for clj->cl, need to define printable representation for chars as well.
- Current holdup is defining a print-method for
STANDARD-CHAR, which claims the class doesnāt exist.
- Looking at print-escape and friends to see if we can hack this. We may need our own printer, outside of the provided REPL.
- defined a reader macro that coerces \c to #\c.
- Maybe less useful for bootstrapping.
reading/printing booleans
- Similar issues as characters. PAIP has a good chapter on this for similar issues with Scheme.
.- field access
- wrap to calls for CLOS slot
ns.name/fn
- detect / in symbol name, coerce to qualified internal symbol ns.name::fn
(.someMethod object)
- .hello is a valid function name in CLā¦
- Reader macro for .?
- need to incorporate . form ala: (. obj someMethod)
::qualified-key
- ???
- is used as a delimiter for package symbols in CL.
- need to parse the symbol name and dispatchā¦.
- ::blah -> :BLAH for nowā¦CL reader eliminates redundant colon.
- Should be simple to modify the reader macro for :
(aget obj field) from cljsā¦
- keep? This doesnāt work the same in clj jvmā¦
Keyword access
Keywords are applicable in Clojure. That means they show up in the function position in forms. This wonāt jive in CL directly.
- Possible option: reader macro for lists, detect presence of keyword in function call positition, if not quoted.
- Replace keyword with a funcallable thing that has a print-method looking like a keyword?
- Or try to hack eval (dubious).
- custom read / eval / printā¦.
Looks like we can just leverage he function namespace to get around thisā¦.keywords are ājustā symbolsā¦.
- (defun :a (m) (gethash :a m))
- (defun (setf :A) (new-value m) (setf (gethash :A m) new-value))
- (defparameter ht (make-hash-table))
- (setf (:A ht) 3)
So the workaround is:
- Need a reader macro lists.
- If we see a keyword in the function position,
- we define a function for the keyword via defun.
- define a setf expander that provides hashmap access (alternately, something thatās not mutating).
Looks like that may work inside the existing ecosystemā¦. Significant progress / experimentation in clclojure.keyworfuncs. However, itās looking like, to get āfullā access (even with some tricky pseudo-keyword class, macros, and the above suff), weāre probably better off hacking eval / compile.
Destructuring.
This may be a bit tricky, although there are a limited number of clojure forms.
- Given the goal is to read tools.analyzer and tools.reader, we may get by with an un-prettied version of the original source that elides destructuring (e.g. via macroexpansion on the CLJ side) into something simpler to digest for clclojureās bootstrap.
Seq library.
This shouldnāt be too hard. I already have a lazy list lib prototype as well as generic functions for the basic ops. I think Iāll try to use the protocols defined in the clojurescript version as much possible, rather than baking in a lot of the seq abstraction in the host language like clojure does.
- For simple bootstrapping, this if fine, but we already get all of this with the CLJS core implementation. So, get the minimums there and gain everything elseā¦.
Strings
CL strings are mutable (character arrays), clj/cljs are notā¦
- Can we inherit from string to create an immutable type that outlaws setf?
- I think most string operations (ala the sequence-based ones)
return copies (which are mutable).
- We can prevent setf in clojure mode, providing a pure API
for string manipulationā¦
- As well as impureā¦.hmm
- We can prevent setf in clojure mode, providing a pure API
for string manipulationā¦
Regexes
Leverage CL-PPRCE
- check for reader macro collisionsā¦.
Compilation / Analysis
Currently, we project everything to (portable) CL, and let the host do the dirty work for compilation/analysis.
- Future, port clojure.tools.analyzer
- Maybe use that for some of our effortsā¦
- If we have enough ported, look at using clojure.tools.reader to help as well.
Usage
Currently just clone the repository. Assuming you have ASDF setup properly, you should be able to evaluate (require āclclojure) at your Lisp REPL and itāll compile (if youāre in the project dir in lisp).
Alternately, you can compile and load the .asd file, then use quicklisp to load it from the repl. (ql:quickload :clclojure)
You currently get persistent vectors, literals, defprotocol, extend-protocol.
You can mess with the currently limited clojure evaluator in the :clclojure.base package, although Iām moving in a different direction now that I think I can explot CL better.
You can see rudimentary examples and usage in the :clclojure.example package (example.lisp). TODO: shift to named-readtables to keep clclojure from hijacking your session. Right now, we take over the REPLā¦..user beware!
Iām a bit new to building Common Lisp projects, so the packaging will likely change as I learn. At some point, if anything useful comes out of this experiment, I may see if it can get pushed to quicklisp.
License
Eclipse Public License, just like Clojure.