Implementation of a Lisp Engine Replacement

Owner: ???

Effort: ???

Dependencies: ???

Let's take a look at the sort of work that would be required if we were to replace the existing Elisp engine in XEmacs with some other engine, for example, the Clisp engine. I'm assuming here, of course, that we are not going to be changing the interface here at the same time, which is to say that we will be keeping the same Elisp language that we currently have as the extension language for XEmacs, except perhaps for incremental changes that we will make, such as lexical scoping and proper structure support in an attempt to gradually move the language towards an upwardly-compatible goal, such as Common Lisp. I am writing this page primarily as food for thought. I feel fairly strongly that actually doing this work would be a big waste of effort that would inevitably become a huge time sink on the part of nearly everyone involved in XEmacs development, and not only for the ones who were supposed to be actually doing the engine change. I feel that most of the desired changes that we want for the language and/or the engine can be achieved with much less effort and time through incremental changes to the existing code base.

First of all, in order to make a successful Lisp engine change in XEmacs, it is vitally important that the work be done through a series of incremental stages where at the end of each stage XEmacs can be compiled and run, and it works. It is tempting to try to make the change all at once, but this would be disastrous. If the resulting product worked at all, it would inevitably contain a huge number of subtle and extremely difficult to track down bugs, and it would be next to impossible to determine which of the myriad changes made introduced the bug.

Now let's look at what the possible stages of implementation could be.

An Extra C Preprocessing Stage

The first step would be to introduce another preprocessing stage for the XEmacs C code, which is done before the C compiler itself is invoked on the code, and before the standard C preprocessor runs. The C preprocessor is simply not powerful enough to do many of the things we would like to do in the C code. The existing results of this have been a combination of a lot of hacked up and tricky-to-maintain stuff (such as the DEFUN macro, and the associated DEFSUBR), as well as code constructs that are difficult to write. (Consider for example, attempting to do structured exception handling, such as catch/throw and unwind-protect constructs), as well as code that is potentially or actually unsafe (such as the uses of alloca), which could easily cause stack overflow with large amounts of memory allocated in this fashion.) The problem is that the C preprocessor does not allow macros to have the power of an actual language, such as C or Lisp. What our own preprocessor should do is allow us to define macros, whose definitions are simply functions written in some language which are executed at compile time, and whose arguments are the actual argument for the macro call, as well as an environment which should have a data structure representation of the C code in the file and allow this environment to be queried and modified. It can be debated what the language should be that these extensions are written in. Whatever the language chosen, it needs to be a very standard language and a language whose compiler or interpreter is available on all of the platforms that we could ever possibly consider putting XEmacs to, which is basically to say all the platforms in existence. One obvious choice is C, because there will obviously be a C compiler available, because it is needed to compile XEmacs itself. Another possibility is Perl, which is already installed on most systems, and is universally available on all others. This language has powerful text processing facilities which would probably make it possible to implement the macro definitions more quickly and easily; however, this might also encourage bad coding practices in the macros (often simple text processing is not appropriate, and more sophisticated parsing or recursive data structure processing needs to be done instead), and we'd have to make sure that the nested data structure that comprises the environment could be represented well in Perl. Elisp would not be a good choice because it would create a bootstrapping problem. Other possible languages, such as Python, are not appropriate, because most programmers are unfamiliar with this language (creating a maintainability problem) and the Python interpreter would have to be included and compiled as part of the XEmacs compilation process (another maintainability problem). Java is still too much in flux to be considered at this point.

The macro facility that we will provide needs to add two features to the language: the ability to define a macro, and the ability to call a macro. One good way of doing this would be to make use of special characters that have no meaning in the C language (or in C++ for that matter), and thus can never appear in a C file outside of comments and strings. Two obvious characters are the @ sign and the $ sign. We could, for example, use @ defined to define new macros, and the $ sign followed by the macro name to call a macro. (Proponents of Perl will note that both of these characters have a meaning in Perl. This should not be a problem, however, because the way that macros are defined and called inside of another macro should not be through the use of any special characters which would in effect be extending the macro language, but through function calls made in the normal way for the language.)

The program that actually implements this extra preprocessing stage needs to know a certain amount about how to parse C code. In particular, it needs to know how to recognize comments, strings, character constants, and perhaps certain other kinds of C tokens, and needs to be able to parse C code down to the statement level. (This is to say it needs to be able to parse function definitions and to separate out the statements, if blocks, while blocks, etc. within these definitions. It probably doesn't, however need to parse the contents of a C expression.) The preprocessing program should work first by parsing the entire file into a data structure (which may just contain expressions in the form of literal strings rather than a data structure representing the parsed expression). This data structure should become the environment parameter that is passed as an argument to macros as mentioned above. The implementation of the parsing could and probably should be done using lex and yacc. One good idea is simply to steal some of the lex and yacc code that is part of GCC.

Here are some possibilities that could be implemented as part of the preprocessing:

A proper way of doing the DEFUN macros. These could, for example, take an argument list in the form of a Lisp argument list (complete with keyword parameters and other complex features) and automatically generate the appropriate subr structure, the appropriate C function definition header, and the appropriate call to the DEFSUBR initialization function.
A truly safe and easy to use implementation of the alloca function. This could allocate the memory in any fashion it chooses (calling malloc using a large global array, or a series of such arrays, etc.) an insert in the appropriate places to automatically free up this memory. (Appropriate places here would be at the end of the function and before any return statements. Non-local exits can be handled in the function that actually implements the non-local exit.)
If we allow for the possibility of having an arbitrary Lisp engine, we can't necessarily assume that we can call Lisp primitives implemented in C from other C functions by simply making a function all. Perhaps something special needs to happen when this is done. This could be handled fairly easily by having our new and improved DEFUN macro define a new macro for use when calling a primitive.

Make the Existing Lisp Engine be Self-contained.

The goal of this stage is to gradually build up a self-contained Lisp engine out of the existing XEmacs core, which has no dependencies on any of the code elsewhere in the XEmacs core, and has a well-defined and black box-style interface. (This is to say that the rest of the C code should not be able to access the implementation of the Lisp engine, and should make as few assumptions as possible about how this implementation works). The Lisp engine could, and probably should, be built up as a separate library which can be compiled on its own without any of the rest of the XEmacs C code, and can be tested in this configuration as well.

The creation of this engine library should be done as a series of subsets, each of which moves more code out of the XEmacs core and into the engine library, and XEmacs should be compilable and runnable between each sub-step. One possible series of sub-steps would be to first create an engine that does only object allocation and garbage collection, then as a second sub-step, move in the code that handles symbols, symbol values, and simple binding, and then finally move in the code that handles control structures, function calling, byte-code execution, exception handling, etc. (It might well be possible to further separate this last sub-step).

Removal of Assumptions About the Lisp Engine Implementation

Currently, the XEmacs C code makes all sorts of assumptions about the implementation of the Lisp engine, particularly in the areas of object allocation, object representation, and garbage collection. A different Lisp engine may well have different ways of doing these implementations, and thus the XEmacs C code must be rid of any assumptions about these implementations. This is a tough and tedious job, but it needs to be done. Here are some examples:

GCPRO must go. The GCPRO mechanism is tedious, error-prone, unmaintainable, and fundamentally unsafe. As anyone who has worked on the C Core of XEmacs knows, figuring out where to insert the GCPRO calls is an exercise in black magic, and debugging crashes as a result of incorrect GCPROing is an absolute nightmare. Furthermore, the entire mechanism is fundamentally unsafe. Even if we were to use the extra preprocessing stage detailed above to automatically generate GCPRO and UNGCPRO calls for all Lisp object variables occurring anywhere in the C code, there are still places where we could be bitten. Consider, for example, code which calls cons and where the two arguments to this functions are both calls to the append function. Now the append function generates new Lisp objects, and it also calls QUIT, which could potentially execute arbitrary Lisp code and cause a garbage collection before returning control to the append function. Now in order to generate the arguments to the cons function, the append function is called twice in a row. When the first append call returns, new Lisp data has been created, but has no GCPRO pointers to it. If the second append call causes a garbage collection, the Lisp data from the first append call will be collected and recycled, which is likely to lead to obscure and impossible-to-debug crashes. The only way around this would be to rewrite all function calls whose parameters are Lisp objects in terms of temporary variables, so that no such function calls ever contain other function calls as arguments. This would not only be annoying to implement, even in a smart preprocessor, but would make the C code become incredibly slow because of all the constant updating of the GCPRO lists.
The only proper solution here is to completely do away with the GCPRO mechanism and simply do conservative garbage collection over the C stack. There are already portable implementations of conservative pointer marking over the C stack, and these could easily be adapted for use in the Elisp garbage collector. If, as outlined above, we use an extra preprocessing stage to create a new version of alloca that allocates its memory elsewhere than actually on the C stack, and we ensure that we don't declare any large arrays as local variables, but instead use alloca, then we can be guaranteed that the C stack is small and thus that the conservative pointer marking stage will be fast and not very likely to find false matches.
Removing the GCPRO declarations as just outlined would also remove the assumption currently made that garbage collection can occur only in certain places in the C code, rather than in any arbitrary spot. (For example, any time an allocation of Lisp data happens). In order to make things really safe, however, we also have to remove another assumption as detailed in the following item.
Lisp objects might be relocatable. Currently, the C code assumes that Lisp objects other than string data are not relocatable and therefore it's safe to pass around and hold onto the actual pointers for the C structures that implement the Lisp objects. Current code, for example, assumes that a Lisp_Object of type buffer and a C pointer to a struct buffer mean basically the same thing, and indiscriminately passes the two kinds of buffer pointers around. With relocatable Lisp objects, the pointers to the C structures might change at any time. (Remember, we are now assuming that a garbage collection can happen at basically any point). All of the C code needs to be changed so that Lisp objects are always passed around using a Lisp object type, and the underlying pointers are only retrieved at the time when a particular data element out of the structure is needed. (As an aside, here's another reason why Lisp objects, instead of pointers, should always be passed around. If pointers are passed around, it's conceivable that at the time a garbage collection occurs, the only reference to a Lisp object (for example, a deleted buffer) would be in the form of a C pointer rather than a Lisp object. In such a case, the conservative pointer marking mechanism might not notice the reference, especially if, in an attempt to eliminate false matches and make the code generally more efficient, it will be written so that it will look for actual Lisp object references.)
I would go a step farther and completely eliminate the macros that convert a Lisp object reference into a C pointer. This way the only way to access an element out of a Lisp object would be to use the macro for that element, which in one atomic operation de-references the Lisp object reference and retrieves the value contained in the element. We probably do need the ability to retrieve actual C pointers, though. For example, in the case where an array is stored in a Lisp object, or simply for efficiency purposes where we might want some code to retrieve the C pointer for a Lisp object, and work on that directly to avoid a whole bunch of extra indirections. I think the way to do this would be through the use of a special locking construct implemented as part of the extra preprocessor stage mentioned above. This would essentially be what you might call a lock block, just like a while block. You'd write the word lock followed by a parenthesized expression that retrieves the C pointer and stores it into a variable that is scoped only within the lock block and followed in turn by some code in braces, which is the actual code associated with the lock block, and which can make use of this pointer. While the code inside the lock block is executing, that particular pointer and the object pointed to by it is guaranteed not to be relocated.
If all the XEmacs C code were converted according to these rules, there would be no restrictions on the sorts of implementations that can be used for the garbage collector. It would be possible, for example, to have an incremental asynchronous relocating garbage collector that operated continuously in another thread while XEmacs was running.
The C implementation of Lisp objects might not, and probably should not, be visible to the rest of the XEmacs C code. It should theoretically be possible, for example, to implement Lisp objects entirely in terms of association lists, rather than using C structures in the standard way. (This may be an extreme example, but it's good to keep in mind an example such as this when cleaning up the XEmacs C code). The changes mentioned in the previous item would go a long way towards removing this assumption. The only places where this assumption might still be made would be inside of the lock blocks where an actual pointer is retrieved. (Also, of course, we'd have to change the way that Lisp objects are defined in C so that this is done with some function calls and new and improved macros rather than by having the XEmacs C code actually define the structures. This sort of thing would probably have to be done in any case once the allocation mechanism is moved into a separate library.) With some thought it should be possible to define the lock block interface in such a way as to remove any assumptions about the implementation of Lisp objects.
C code may not be able to call Lisp primitives that are defined in C simply by making standard C function calls. There might need to be some wrapper around all such calls. This could be achieved cleanly through the extra preprocessing step mentioned above, in line with the example described there.

Actually Replacing the Engine.

Once we've done all of the work mentioned in the previous steps (and admittedly, this is quite a lot of work), we should have an XEmacs that still uses what is essentially the old and previously existing Lisp engine, but which is ready to have its Lisp engine replaced. The replacement might proceed as follows:

Identify any further changes that need to be made to the engine interface that we have defined as a result of the previous steps so that features and idiosyncrasies of various Lisp engines that we examine could be properly supported.
Pick a Lisp engine and write an interface layer that sits on top of this Lisp engine and makes it adhere to what I'll now call the XEmacs Lisp engine interface.
Strongly consider creating, if we haven't already done so, a test suite that can test the XEmacs Lisp engine interface when used with a stand-alone Lisp engine.
Test the hell out of the Lisp engine that we've chosen when combined with its XEmacs Lisp engine interface layer as a stand-alone program.
Now finally attach this stand-alone program to XEmacs itself. Debug and fix any further problems that ensue (and there inevitably will be such problems), updating the test suite as we go along so that if it were run again on the old and buggy interfaced Lisp engine, it would note the bug.

Ben Wing

Conform with <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Automatically validated by PSGML