[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43. Future Work


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.1 Future Work – General Suggestions

Jamie Zawinski’s XEmacs Wishlist

This document is based on Jamie Zawinski’s xemacs wishlist. Throughout this page, “I” refers to Jamie.

The list has been substantially reformatted and edited to fit the needs of this site. If you have any soul at all, you’ll go check out the original. OK? You should also check out some other wishlists.

About the List

I’ve ranked these (roughly) from easiest to hardest; though of all of them, I think the debugger improvements would be the most useful. I think the combination of emacs+gdb is the best Unix development environment currently available, but it’s still lamentably primitive and extremely frustrating (much like Unix itself), especially if you know what kinds of features more modern integrated debuggers have.

XEmacs Wishlist

Improve the keyboard macro system.

Keyboard macros are one of the most useful concepts that emacs has to offer, but there’s room for improvement.

Make it possible to embed one macro inside of another.

Often, I’ll define a keyboard macro, and then realize that I’ve left something out, or that there’s more that I need to do; for example, I may define a macro that does something to the current line, and then realize that I want to apply it to a lot of lines. So, I’d like this to work:

 
C-x ( 
; start macro #1
... 
; (do stuff)
C-x ) 
; done with macro #1
... 
; (do stuff)
C-x ( 
; start macro #2
C-x e 
; execute macro #1 (splice it into macro #2)
C-s foo 
; move forward to the next spot
C-x ) 
; done with macro #2
C-u 1000 C-x e 
; apply the new macro

That is, simply, one should be able to wrap new text around an existing macro. I can’t tell you how many times I’ve defined a complex macro but left out the “C-n C-a” at the end...

Yes, you can accomplish this with M-x name-last-kbd-macro, but that’s a pain. And it’s also more permanent than I’d often like.

Make it possible to correct errors when defining a macro.

Right now, the act of defining a macro stops if you get an error while defining it, and all of the characters you’ve already typed into the macro are gone. It needn’t be that way. I think that, when that first error occurs, the user should be given the option of taking the last command off of the macro and trying again.

The macro-reader knows where the bounds of multi-character command sequences are, and it could even keep track of the corresponding undo records; rubbing out the previous entry on the macro could also undo any changes that command had made. (This should also work if the macro spans multiple buffers, and should restore window configurations as well.)

You’d want multi-level undo for this as well, so maybe the way to go would be to add some new key sequence which was used only as the back-up-inside-a-keyboard-macro-definition command.

I’m not totally sure that this would end up being very usable; maybe it would be too hard to deal with. Which brings us to:

Make it possible to edit a keyboard macro after it has been defined.

I only just discovered edit-kbd-macro (C-x C-k). It is very, very cool.

The trick it does of showing the command which will be executed is somewhat error-prone, as it can only look up things in the current map or the global map; if the macro changed buffers, it wouldn’t be displaying the right commands. (One of the things I often use macros for is operating on many files at once, by bringing up a dired buffer of those files, editing them, and then moving on to the next.)

However, if the act of recording a macro also kept track of the actual commands that had gotten executed, it could make use of that info as well.

Another way of editing a macro, other than as text in a buffer, would be to have a command which single-steps a macro: you would lean on the space bar to watch the macro execute one character (command?) at a time, and then when you reached the point you wanted to change, you could do some gesture to either: insert some keystrokes into the middle of the macro and then continue; or to replace the rest of the macro from here to the end; or something.

Another similar hack might be to convert a macro to the equivalent lisp code, so that one could tweak it later in ways that would be too hard to do from the keyboard (wrapping parts of it in while loops or something.) (M-x insert-kbd-macro isn’t really what I’m talking about here: I mean insert the list of commands, not the list of keystrokes.)

Save my wrists!

In the spirit of the ‘teach-extended-commands-p’ variable, it would be interesting if emacs would keep track of what are the commands I use most often, perhaps grouped by proximity or mode – it would then be more obvious which commands were most likely candidates for placement on a toolbar, or popup menu, or just a more convenient key binding.

Bonus points if it figures out that I type “bt\n” and “ret\ny\n” into my ‘*gdb*’ buffer about a hundred thousand times a day.

XmCreateFileSelectionBox

The thing that “File/Open...” pops up has excellent hack value, but as a user interface, it’s an abomination. Isn’t it time someone added a real file selection dialog already? (For the Motifly-challenged, the Athena-based file selector that GhostView uses seems adequate.)

Improve the toolbar system.

It’s great that XEmacs has a toolbar, but it’s damn near impossible to customize it.

Make it easy to define new toolbar buttons.

Currently, to define a toolbar button that has a text equivalent, one must edit a pixmap, and put the text there! That’s prohibitive. One should be able to add some kind of generic toolbar button, with a plain icon or none at all, but which has a text label, without having to use a paint program.

Make it easy to have customized, mode-local toolbars.

In my c-mode-hook, for example, I can add a couple of new keybindings, and delete a few others, and to do that, I don’t have to duplicate the entire definition of the c-mode-map. Making mode-local additions and subtractions to the toolbars should be as easy.

Make it easy to have customized, mode-local popup menus.

The same situation holds for the right-mouse-button popup menu; one should be able to add new commands to those menus without difficulty. One problem is that each mode which does have a popup menu implements it in a different way...

Make the External Widget work.

About half of the work is done to make a replacement for the XmText widget which offloads editing responsibility to an external Emacs process. Someone should finish that. The benefit here would be that then, any Motif program could be linked such that all editing happened with a real Emacs behind it. (If you’re Athena-minded, flavor with Text instead of XmText – it’s probably easy to make it work with both.)

The part of this that is done already is the ability to run an Emacs screen on a Window object that has been created by another process (this is what the ‘ExternalClient.c’ and ‘ExternalShell.c’ stuff is.) What is left to be done is, adding the text-widget-editor aspects of this.

First, the emacs screen being displayed on that window would have to be one without a modeline, and one which behaved sensibly in the context of “I am a small multi-line text area embedded in a dialog box” as opposed to “I am a full-on text editor and lord of all that I survey.”

Second, the API that the (non-emacs-aware) user of the XmText widget expects would need to be implemented: give the caller the ability to pull the edited text string back out, and so on. The idea here being, hooking up emacs as the widget editor should be as transparent as possible.

Bring the debugger interface into the eighties.

Some of you may have seen my ‘gdb-highlight.el’ package, that I posted to gnu.emacs.sources last month. I think it’s really cool, but there should be a lot more work in that direction. For those of you who haven’t seen it, what it does is watch text that gets inserted into the ‘*gdb*’ buffer and make very nearly everything be clickable and have a context-sensitive menu. Generally, the types that are noticed are:

Any time one of those objects is presented in the ‘*gdb*’ buffer, it is mousable. Clicking middle button on it takes some default action (edits the function, selects the stack frame, disables the breakpoint, ...) Clicking the right button pops up a menu of commands, including commands specific to the object under the mouse, and/or other objects on the same line.

So that’s all well and good, and I get far more joy out of what this code does for me than I expected, but there are still a bunch of limitations. The debugger interface needs to do much, much more.

Make gdbsrc-mode not suck.

The idea behind gdbsrc-mode is on the side of the angels: one should be able to focus on the source code and not on the debugger buffer, absolutely. But the implementation is just awful.

First and foremost, it should not change “modes” (in the more general sense). Any commands that it defines should be on keys which are exclusively used for that purpose, not keys which are normally self-inserting. I can’t be the only person who usually has occasion to actually edit the sources which the debugger has chosen to display! Switching into and out of gdbsrc-mode is prohibitive.

I want to be looking at my sources at all times, yet I don’t want to have to give up my source-editing gestures. I think the right way to accomplish this is to put the gdbsrc commands on the toolbar and on popup menus; or to let the user define their own keys (I could see devoting my <kp_enter> key to “step”, or something common like that.)

Also it’s extremely frustrating that one can’t turn off gdbsrc mode once it has been loaded, without exiting and restarting emacs; that alone means that I’d probably never take the time to learn how to use it, without first having taken the time to repair it...

Make it easier access to variable values.

I want to be able to double-click on a variable name to highlight it, and then drag it to the debugger window to have its value printed.

I want gestures that let me write as well as read: for example, to store value A into slot B.

Make all breakpoints visible.

Any time there is a running gdb which has breakpoints, the buffers holding the lines on which those breakpoints are set should have icons in them. These icons should be context-sensitive: I should be able to pop up a menu to enable or disable them, to delete them, to change their commands or conditions.

I should also be able to move them. It’s annoying when you have a breakpoint with a complex condition or command on it, and then you realize that you really want it to be at a different location. I want to be able to drag-and-drop the icon to its new home.

Make a debugger status display window.
  • I want a window off to the side that shows persistent information – it should have a pane which is a drag-editable, drag-reorderable representation of the elements on gdb’s “display” list; they should be displayed here instead of being just dumped in with the rest of the output in the ‘*gdb*’ buffer.
  • I want a pane that displays the current call-stack and nothing else. I want a pane that displays the arguments and locals of the currently-selected frame and nothing else. I want these both to update as I move around on the stack.
  • Since the unfortunate reality is that excavating this information from gdb can be slow, it would be a good idea for these panes to have a toggle button on them which meant “stop updating”, so that when I want to move fast, I can, but I can easily get the display back when I need it again.

The reason for all of this is that I spend entirely too much time scrolling around in the ‘*gdb*’ buffer; with gdb-highlight, I can just click on a line in the backtrace output to go to that frame, but I find that I spend a lot of time looking for that backtrace: since it’s mixed in with all the other random output, I waste time looking around for things (and usually just give up and type “bt” again, then thrash around as the buffer scrolls, and I try to find the lower frames that I’m interested in, as they have invariably scrolled off the window already...

Save and restore breakpoints across emacs/debugger sessions.

This would be especially handy given that gdb leaks like a sieve, and with a big program, I only get a few dozen relink-and-rerun attempts before gdb has blown my swap space.

Keep breakpoints in sync with source lines.

When a program is recompiled and then reloaded into gdb, the breakpoints often end up in less-than-useful places. For example, when I edit text which occurs in a file anywhere before a breakpoint, emacs is aware that the line of the bp hasn’t changed, but just that it is in a different place relative to the top of the file. Gdb doesn’t know this, so your breakpoints end up getting set in the wrong places (usually the maximally inconvenient places, like after a loop instead of inside it). But emacs knows, so emacs should inform the debugger, and move the breakpoints back to the places they were intended to be.

(Possibly the OOBR stuff does some of this, but can’t tell, because I’ve never been able to get it to do anything but beep at me and mumble about environments. I find it pretty funny that the manual keeps explaining to me how intuitive it is, without actually giving me a clue how to launch it...)

Add better dialog box features.

It’d be nice to be able to create more complex dialog boxes from emacs-lisp: ones with checkboxes, radio button groups, text fields, and popup menus.

Add embeddable dialog boxes.

One of the things that the now-defunct Energize code (the C side of it, that is) could do was embed a dialog box between the toolbar and the main text area – buffers could have control panels associated with them, that had all kinds of complex behavior.

Make the mark-stack be visible.

You know, I’ve encountered people who have been using emacs for years, and never use the mark stack for navigation. I can’t live without it; “C-u C-SPC” is among my most common gestures.

  1. It would be a lot easier to realize what’s going to happen if the marks on the mark stack were visible. They could be displayed as small “caret” glyphs, for example; something large enough to be visible, but not easily mistaken for a character or for the cursor.
  2. The marks and the selected region should be visible in the scrollbar as well – I don’t remember where I first saw this idea, but it’s very cool: there’s a second, less-strongly-rendered “thumb” in the scrollbar which indicates the position and size of the selection; and there are tiny tick-marks which indicate the positions of the saved points.
  3. Markers which are in registers (point-to-register, C-x /) should be displayed differently (more prominent.)
  4. It’d be cool if you could pick up markers and move them around, to adjust the points you’ll be coming back to later.
Write a new garbage collector.

The emacs GC is very primitive; it is also, fortunately, a rather well isolated module, and it would not be a very big task to swap it with a new one (once that new one was written, that is.) Someone should go bone up on modern GC techniques, and then just dive right in...

Add support for lexical scope to the emacs-lisp runtime.

Yadda yadda, this list goes to eleven.


Subject: Re: XEmacs wishlist Date: Wed, 14 May 1997 16:18:23 -0700 From: Jamie Zawinski <jwz@netscape.com> Newsgroups: comp.emacs.xemacs, comp.emacs

Andreas Schwab wrote:

Use ‘C-u C-x (’:

start-kbd-macro:
Non-nil arg (prefix arg) means append to last macro defined; This begins by re-executing that macro as if you typed it again.

Cool, I didn’t know it did that...

But it only lets you append. I often want to prepend, or embed the macro multiple times (motion 1, C-x e, motion 2, C-x e, motion 3.)

21.2 Showstoppers

Author: Ben Wing

DISTRIBUTION ISSUES

A. Unified Source Tarball.

Packages go under root/lib/xemacs/xemacs-packages and no one ever has to mess with –package-path and the result can be moved from one directory to another pre- or post-install.

Unified Binary Tarballs with Packages.

Same principles as above.

If people complain, we can also provide split binary tarballs (architecture dependent and independent) and place these files in a subdirectory so as not to confuse the majority just looking for one tarball.

Under Windows, we need to provide a WISE-style GUI setup program. It’s already there but needs some work so you can select "all" packages easily (should be the default).

Parallel Root and Package Trees.

If the user downloads separately, the main source and the packages, he will naturally untar them into the same directory. This results in the parallel root and package structure. We should support this as a "last resort," i.e., if we find no packages anywhere and are about to resign ourselves to not having packages, then look for a parallel package tree. The user who sets things up like this should be able to either run in place or "make install" and get a proper installed XEmacs. Never should the user have to touch –package-path.

II. WINDOWS PRINTING

Looks like the internals are done but not the GUI. This must be working in 21.2.

III. WINDOWS MULE

Basic support should be there. There’s already a patch to get things started and I’ll be doing more work to make this real.

IV. GUTTER ETC.

This stuff needs to be "stable" and generally free from bugs. Any APIs we create need to be well-reviewed or marked clearly as experimental.

V. PORTABLE DUMPER

Last bits need to be cleaned up. This should be made the "default" for a while to flush-out problems. Under Microsoft Windows, Portable Dumper must be the default in 21.2 because of the problems with the existing dump process.

COMMENT: I’d like to feature freeze this pretty soon and create a 21.3 tree where all of my major overhauls of Mule-related stuff will go in. At the same time or around, we need to do the move-around in the repository (or create a new one) and "upgrade" to the latest CVS server.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.2 Future Work – Elisp Compatibility Package

Author: Ben Wing

A while ago I created a package called Sysdep, which aimed to be a forward compatibility package for Elisp. The idea was that instead of having to write your package using the oldest version of Emacs that you wanted to support, you could use the newest XEmacs API, and then simply load the Sysdep package, which would automatically define the new API in terms of older APIs as necessary. The idea of this package was good, but its design wasn’t perfect, and it wasn’t widely adopted. I propose a new package called Compat that corrects the design flaws in Sysdep, and hopefully will be adopted by most of the major packages.

In addition, this package will provide macros that can be used to bracket code as necessary to disable byte compiler warnings generated as a result of supporting the APIs of different versions of Emacs; or rather the Compat package strives to provide useful constructs to make doing this support easier, and these constructs have the side effect of not causing spurious byte compiler warnings. The idea here is that it should be possible to create well-written, clean, and understandable Elisp that supports both older and newer APIs, and has no byte compiler warnings. Currently many warnings are unavoidable, and as a result, they are simply ignored, which also causes a lot of legitimate warnings to be ignored.

The approach taken by the Sysdep package to make sure that the newest API was always supported was fairly simple: when the Sysdep package was loaded, it checked for the existence of new API functions, and if they weren’t defined, it defined them in terms of older API functions that were defined. This had the advantage that the checks for which API functions were defined were done only once at load time rather than each time the function was called. However, the fact that the new APIs were globally defined caused a lot of problems with unwanted interactions, both with other versions of the Sysdep package provided as part of other packages, and simply with compatibility code of other sorts in packages that would determine whether an API existed by checking for the existence of certain functions within that API. In addition, the Sysdep package did not scale well because it defined all of the functions that it supported, regardless of whether or not they were used.

The Compat package remedies the first problem by ensuring that the new APIs are defined only within the lexical scope of the packages that actually make use of the Compat package. It remedies the second problem by ensuring that only definitions of functions that are actually used are loaded. This all works roughly according to the following scheme:

  1. Part of the Compat package is a module called the Compat generator. This module is actually run as an additional step during byte compilation of a package that uses Compat. This can happen either through the makefile or through the use of an eval-when-compile call within the package code itself. What the generator does is scan all of the Lisp code in the package, determine which function calls are made that the Compat package knows about, and generates custom compat code that conditionally defines just these functions when the package is loaded. The custom compat code can either be written to a separate Lisp file (for use with multi-file packages), or inserted into the beginning of the Lisp file of a single file package. (In the latter case, the package indicates where this generated code should go through the use of magic comments that mark the beginning and end of the section. Some will say that doing this trick is bad juju, but I have done this sort of thing before, and it works very well in practice).
  2. The functions in the custom compat code have their names prefixed with both the name of the package and the word compat, ensuring that there will be no name space conflicts with other functions in the same package, or with other packages that make use of the Compat package.
  3. The actual definitions of the functions in the custom compat code are determined at run time. When the equivalent API already exists, the wrapper functions are simply defined directly in terms of the actual functions, so that the only run time overhead from using the Compat package is one additional function call. (Alternatively, even this small overhead could be avoided by retrieving the definitions of the actual functions and supplying them as the definitions of the wrapper functions. However, this appears to me to not be completely safe. For example, it might have bad interactions with the advice package).
  4. The code that wants to make use of the custom compat code is bracketed by a call to the construct compat-execute. What this actually does is lexically bind all of the function names that are being redefined with macro functions by using the Common Lisp macro macrolet. (The definition of this macro is in the CL package, but in order for things to work on all platforms, the definition of this macro will presumably have to be copied and inserted into the custom compat code).

In addition, the Compat package should define the macro compat-if-fboundp. Similar macros such as compile-when-fboundp and compile-case-fboundp could be defined using similar principles). The compat-if-fboundp macro behaves just like an (if (fboundp ...) ...) clause when executed, but in addition, when it’s compiled, it ensures that the code inside the if-true sub-block will not cause any byte compiler warnings about the function in question being unbound. I think that the way to implement this would be to make compat-if-fboundp be a macro that does what it’s supposed to do, but which defines its own byte code handler, which ensures that the particular warning in question will be suppressed. (Actually ensuring that just the warning in question is suppressed, and not any others, might be rather tricky. It certainly requires further thought).

Note: An alternative way of avoiding both warnings about unbound functions and warnings about obsolete functions is to just call the function in question by using funcall, instead of calling the function directly. This seems rather inelegant to me, though, and doesn’t make it obvious why the function is being called in such a roundabout manner. Perhaps the Compat package should also provide a macro compat-funcall, which works exactly like funcall, but which indicates to anyone reading the code why the code is expressed in such a fashion.

If you’re wondering how to implement the part of the Compat generator where it scans Lisp code to find function calls for functions that it wants to do something about, I think the best way is to simply process the code using the Lisp function read and recursively descend any lists looking for function names as the first element of any list encountered. This might extract out a few more functions than are actually called, but it is almost certainly safer than doing anything trickier like byte compiling the code, and attempting to look for function calls in the result. (It could also be argued that the names of the functions should be extracted, not only from the first element of lists, but anywhere symbol occurs. For example, to catch places where a function is called using funcall or apply. However, such uses of functions would not be affected by the surrounding macrolet call, and so there doesn’t appear to be any point in extracting them).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.3 Future Work – Drag-n-Drop

Author: Ben Wing

Abstract: I propose completely redoing the drag-n-drop interface to make it powerful and extensible enough to support such concepts as drag over and drag under visuals and context menus invoked when a drag is done with the right mouse button, to allow drop handlers to be defined for all sorts of graphical elements including buffers, extents, mode lines, toolbar items, menubar items, glyphs, etc., and to allow different packages to add and remove drop handlers for the same drop sites without interfering with each other. The changes are extensive enough that I think they can only be implemented in version 22, and the drag-n-drop interface should remain experimental until then.

The new drag-n-drop interface centers around the twin concepts of drop site and drop handler. A drop site specifies a particular graphical element where an object can be dropped onto, and a drop handler encapsulates all of the behavior that happens when such an object is dragged over and dropped onto a drop site.

Each drop site has an object associated with it which is passed to functions that are part of the drop handlers associated with that site. The type of this object depends on the graphical element that comprises the drop site. The drop site object can be a buffer, an extent, a glyph, a menu path, a toolbar item path, etc. (These last two object types are defined in Lisp Interface Changes in the sections on menu and toolbar API changes. If we wanted to allow drops onto other kinds of drop sites, for example mode lines, we would have to create corresponding path objects). Each such object type should be able to be accessed using the generalized property interface defined above, and should have a property called drop-handlers associated with it that specifies all of the drop handlers associated with the drop site. Normally, this property is not accessed directly, but instead by using the drop handler API defined below, and Lisp packages should not make any assumptions about the format of the data contained in the drop-handlers property.

Each drop handler has an object of type drop-handler associated with it, whose primary purpose is to be a container for the various properties associated with a particular drop handler. These could include, for example, a function invoked when the drop occurs, a context menu invoked when a drop occurs as a result of a drag with the right mouse button, functions invoked when a dragged object enters, leaves, or moves within a drop site, the shape that the mouse pointer changes to when an object is dragged over a drop site that allows this particular object to be dropped onto it, the MIME types (actually a regular expression matching the MIME types) of the allowable objects that can be dropped onto the drop site, a package tag (a symbol specifying the package that created the drop handler, used for identification purposes), etc. The drop handler object is passed to the functions that are invoked as a result of a drag or a drop, most likely indirectly as one of the properties of the drag or drop event passed to the function. Properties of a drop handler object are accessed and modified in the standard fashion using the generalized property interface.

A drop handler is added to a drop site using the add-drop-handler function. The drop handler itself can either be created separately using the make-drop-handler function and then passed in as one of the parameters to add-drop-handler, or it will be created automatically by the add-drop-handler function, if the drop handler argument is omitted, but keyword arguments corresponding to the valid keyword properties for a drop handler are specified in the add-drop-handler call. Other functions, such as find-drop-handler, add-drop-handler (when specifying a drop handler before which the drop handler in question is to be added), remove-drop-handler etc. should be defined with obvious semantics. All of these functions take or return a drop site object which, as mentioned above, can be one of several object types corresponding to graphical elements. Defined drop handler functions locate a particular drop handler using either the MIME-type or package-tag property of the drop handler, as defined above.

Logically, the drop handlers associated with a particular drop site are an ordered list. The first drop handler whose specified MIME type matches the MIME type of the object being dragged or dropped controls what happens to this object. This is important particularly because the specified MIME type of the drop handler can be a regular expression that, for example, matches all audio objects with any sub-type.

In the current drag-n-drop API, there is a distinction made between objects with an associated MIME type and objects with an associated URL. I think that this distinction is arbitrary, and should not exist. All objects should have a MIME type associated with them, and a new XEmacs-specific MIME type should be defined for URLs, file names, etc. as necessary. I am not even sure that this is necessary, however, as the MIME specification may specify a general concept of a pointer or link to an object, which is exactly what we want. Also in some cases (for example, the name of a file that is locally available), the pointer or link will have another MIME type associated with it, which is the type of the object that is being pointed to. I am not quite sure how we should handle URL and file name objects being dragged, but I am positive that it needs to be integrated with the mechanism used when an object itself is being dragged or dropped.

As is described in a separate page, the misc-user-event event type should be removed and split up into a number of separate event types. Two such event types would be drag-event and drop-event. A drop event is used when an object is actually dropped, and a drag event is used if a function is invoked as part of the dragging process. (Such a function would typically be used to control what are called drag under visuals, which are changes to the appearance of the drop site reflecting the fact that a compatible object is being dragged over it). The drag events and drop events encapsulate all of the information that is pertinent to the drag or drop action occurring, including such information as the actual MIME type of the object in question, the drop handler that caused a function to be invoked, the mouse event (or possibly even a keyboard event) corresponding to the user’s action that is causing the drag or drop, etc. This event is always passed to any function that is invoked as a result of the drag or drop. There should never be any need to refer to the current-mouse-event variable, and in fact, this variable should not be changed at all during a drag or a drop.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.4 Future Work – Standard Interface for Enabling Extensions

Author: Ben Wing

Abstract: Apparently, if you know the name of a package (for example, fusion), you can load it using the require function, but there’s no standard way to turn it on or turn it off. The only way to figure out how to do that is to go read the source file, where hopefully the comments at the start tell you the appropriate magic incantations that you need to run in order to turn the extension on or off. There really needs to be standard functions, such as enable-extension and disable-extension, to do this sort of thing. It seems like a glaring omission that this isn’t currently present, and it’s really surprising to me that nobody has remarked on this.

The easy part of this is defining the interface, and I think it should be done as soon as possible. When the package is loaded, it simply calls some standard function in the package system, and passes it the names of enable and disable functions, or perhaps just one function that takes an argument specifying whether to enable or disable. In any case, this data is kept in a table which is used by the enable-extension and disable-extension function. There should also be functions such as extension-enabled-p and enabled-extension-list, and so on with obvious semantics. The hard part is actually getting packages to obey this standard interface, but this is mitigated by the fact that the changes needed to support this interface are so simple.

I have been conceiving of these enabling and disabling functions as turning the feature on or off globally. It’s probably also useful to have a standard interface returning a extension on or off in just the particular buffer. Perhaps then the appropriate interface would involve registering a single function that takes an argument that specifies various things, such as turn off globally, turn on globally, turn on or off in the current buffer, etc.

Part of this interface should specify the correct way to define global key bindings. The correct rule for this, of course, is that the key bindings should not happen when the package is loaded, which is often how things are currently done, but only when the extension is actually enabled. The key bindings should go away when the extension is disabled. I think that in order to support this properly, we should expand the keymap interface slightly, so that in addition to other properties associated with each key binding is a list of shadow bindings. Then there should be a function called define-key-shadowing, which is just like define-key but which also remembers the previous key binding in a shadow list. Then there can be another function, something like undefine-key, which restores the binding to the most recently added item on the shadow list. There are already hash tables associated with each key binding, and it should be easy to stuff additional values, such as a shadow list, into the hash table. Probably there should also be functions called global-set-key-shadowing and global-unset-key-shadowing with obvious semantics.

Once this interface is defined, it should be easy to expand the custom package so it knows about this interface. Then it will be possible to put all sorts of extensions on the options menu so that they could be turned off and turned on very easily, and then when you save the options out to a file, the design settings for whether these extensions are enabled or not are saved out with it. A whole lot of custom junk that’s been added to a lot of different packages could be removed. After doing this, we might want to think of a way to classify extensions according to how likely we think the user will want to use them. This way we can avoid the problem of having a list of 100 extensions and the user not being able to figure out which ones might be useful. Perhaps the most useful extensions would appear immediately on the extensions menu, and the less useful ones would appear in a submenu of that, and another submenu might contain even less useful extensions. Of course the package authors might not be too happy with this, but the users probably will be. I think this at least deserves a thought, although it’s possible you might simply want to maintain a list on the web site of extensions and a judgment on first of all, how commonly a user might want this extension, and second of all, how well written and bug-free the package is. Both of these sorts of judgments could be obtained by doing user surveys if need be.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.5 Future Work – Better Initialization File Scheme

Author: Ben Wing

Abstract: A proposal is outlined for converting XEmacs to use the .xemacs subdirectory for its initialization files instead of putting them in the user’s home directory. In the process, a general pre-initialization scheme is created whereby all of the initialization parameters, such as the location of the initialization files, whether these files are loaded or not, where the initial frame is created, etc. that are currently specified by command line arguments, by environment variables, and other means, can be specified in a uniform way using Lisp code. Reasonable default behavior for everything will still be provided, and the older, simpler means can be used if desired. Compatibility with the current location and name of the initialization file, and the current ill-chosen use for the .xemacs directory is maintained, and the problem of how to gracefully migrate a user from the old scheme into the new scheme while still allowing the user to use GNU Emacs or older versions of XEmacs is solved. A proposal for changing the way that the initial frame is mapped is also outlined; this would allow the user’s initialization file to control the way that the initial frame appears without resorting to hacks, while still making echo area messages visible as they appear, and allowing the user to debug errors in the initialization file.

Principles in the new scheme

  1. XEmacs has a defined pre-initialization process. This process, whose purpose is to compute the values of the parameters that control how the initializiaton process proceeds, occurs as early as possible after the Lisp engine has been initialized, and in particular, it occurs before any devices have been opened, or before any initialization parameters are set that could reasonably be expected to be changed. In fact, the pre-initialization process should take care of setting these parameters. The code that implements the pre-initialization process should be written in Lisp and should be called from the Lisp function normal-top-level, and the general way that the user customizes this process should also be done using Lisp code.
  2. The pre-initialization process involves a number of properties, for example the directory containing the user initialization files (normally the .xemacs subdirectory), the name of the user init file, the name of the custom init file, where and what type the initial device is, whether and when the initial frame is mapped, etc. A standard interface is provided for getting and setting the values of these properties using functions such as set-pre-init-property, pre-init-property, etc. At various points during the pre-initialization process, the value of many of these properties can be undecided, which means that at the end of the process, the value of these properties will be derived from other properties in some fashion that is specific to each property.
  3. The default values of these properties are set first from the registry under Windows, then from environment variables, then from command line switches, such as -q and -nw.
  4. One of the command line switches is -pre-init, whose value is a Lisp expression to be evaluated at pre-initialization time, similar to the -eval command line switch. This allows any pre-initialization property to be set from the command line.
  5. Let’s define the term to determine a pre-initialization property to mean if the value of a property is undetermined, it is computed and set according to a rule that is specific to the property. Then after the pre-init properties are initialized from the registry, from the environment variables, from command line arguments, two of the pre-init properties (specifically the init file directory and the location of the pre-init file) are determined. The purpose of the pre-init file is to contain Lisp code that is run at pre-initialization time, and to control how the initialization proceeds. It is a bit similar to the standard init file, but the code in the pre-init file shouldn’t do anything other than set pre-init properties. Executing any code that does I/O might not produce expected results because the only device that will exist at the time is probably a stream device connected to the standard I/O of the XEmacs process.
  6. After the pre-init file has been run, all of the rest of the pre-init properties are determined, and these values are then used to control the initialization process. Some of the rules used in determining specific properties are:
    1. If the .xemacs sub-directory exists, and it’s not obviously a package root (which probably means that it contains a file like init.el or pre-init.el, or if neither of those files is present, then it doesn’t contain any sub-directories or files that look like what would be in a package root), then it becomes the value of the init file directory. Otherwise the user’s home directory is used.
    2. If the init file directory is the user’s home directory, then the init file is called .emacs. Otherwise, it’s called init.el.
    3. If the init file directory is the user’s home directory, then the pre-init file is called .xemacs-pre-init.el. Otherwise it’s called pre-init.el. (One of the reasons for this rule has to do with the dialog box that might be displayed at startup. This will be described below.)
    4. If the init file directory is the user’s home directory, then the custom init file is called .xemacs-custom-init.el. Otherwise, it’s called custom-init.el.
  7. After the first normal device is created, but before any frames are created on it, the XEmacs initialization code checks to see if the old init file scheme is being used, which is to say that the init file directory is the same as the user’s home directory. If that’s the case, then normally a dialog box comes up (or a question is asked on the terminal if XEmacs is being run in a non-windowing mode) which asks if the user wants to migrate his initialization files to the new scheme. The possible responses are Yes, No, and No, and don’t ask this again. If this last response is chosen, then the file .xemacs-pre-init.el in the user’s home directory is created or appended to with a line of Lisp code that sets up a pre-init property indicating that this dialog box shouldn’t come up again. If the Yes option is chosen, then any package root files in .xemacs are moved into .xemacs/packages, the file .emacs is moved into .xemacs/init.el and .emacs in the home directory becomes a symlink to this file. This way some compatibility is still maintained with GNU Emacs and older versions of XEmacs. The code that implements this has to be written very carefully to make sure that it doesn’t accidentally delete or mess up any of the files that get moved around.

The custom init file

The custom init file is where the custom package writes its options. This obviously needs to be a separate file from the standard init file. It should also be loaded before the init file rather than after, as is usually done currently, so that the init file can override these options if it wants to.

Frame mapping

In addition to the above scheme, the way that XEmacs handles mapping the initial frame should be changed. However, this change perhaps should be delayed to a later version of XEmacs because of the user visible changes that it entails and the possible breakage in people’s init files that might occur. (For example, if the rest of the scheme is implemented in 21.2, then this part of the scheme might want to be delayed until version 22.) The basic idea is that the initial frame is not created before the initialization file is run, but instead a banner frame is created containing the XEmacs logo, a button that allows the user to cancel the execution of the init file and an area where messages that are output in the process of running this file are displayed. This area should contain a number of lines, which makes it better than the current scheme where only the last message is visible. After the init file is done, the initial frame is mapped. This way the init file can make face changes and other such modifications that affect initial frame and then have the initial frame correctly come up with these changes and not see any frame dancing or other problems that exist currently.

There should be a function that allows the initialization file to explicitly create and map the first frame if it wants to. There should also be a pre-init property that controls whether the banner frame appears (of course it defaults to true) a property controlling when the initial frame is created (before or after the init file, defaulting to after), and a property controlling whether the initial frame is mapped (normally true, but will be false if the -unmapped command line argument is given).

If an error occurs in the init file, then the initial frame should always be created and mapped at that time so that the error is displayed and the debugger has a place to be invoked.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.6 Future Work – Keyword Parameters

Author: Ben Wing

NOTE: These changes are partly motivated by the various user-interface changes elsewhere in this document, and partly for Mule support. In general the various APIs in this document would benefit greatly from built-in keywords.

I would like to make keyword parameters an integral part of Elisp. The idea here is that you use the &amp;key identifier in the parameter list of a function and all of the following parameters specified are keyword parameters. This means that when these arguments are specified in a function call, they are immediately preceded in the argument list by a keyword, which is a symbol beginning with the ‘:’ character. This allows any argument to be specified independently of any other argument with no need to place the arguments in any particular order. This is particularly useful for functions that take many optional parameters; using keyword parameters makes the code much cleaner and easier to understand.

The cl package already provides keyword parameters of a sort, but I would like to make this more integrated and useable in a standard fashion. The interface that I am proposing is essentially compatible with the keyword interface in Common Lisp, but it may be a subset of the Common Lisp functionality, especially in the first implementation. There is one departure from the Common Lisp specification that I would like to make in order to make it much easier to add keyword parameters to existing functions with optional parameters, and in general, to make optional and keyword parameters coexist more easily. The Common Lisp specification indicates that if a function has both optional and keyword parameters, the optional parameters are always processed before the keyword parameters. This means, for example, that if a function has three required parameters, two optional parameters, and some number of keyword parameters following, and the program attempts to call this function by passing in the three required arguments, and then some keyword arguments, the first keyword specified and the argument following it get assigned to the first and second optional parameters as specified in the function definition. This is certainly not what is intended, and means that if a function defines both optional and keyword parameters, any calls of this function must specify nil for all of the optional arguments before using any keywords. If the function definition is later changed to add more optional parameters, all existing calls to this function that use any keyword arguments will break. This problem goes away if we simply process keyword parameters before the optional parameters.

The primary changes needed to support the keyword syntax are:

  1. The subr object type needs to be modified to contain additional slots for the number and names of any keyword parameters.
  2. The implementation of the funcall function needs to be modified so that it knows how to process keyword parameters. This is the only place that will require very much intricate coding, and much of the logic that would need to be added can be lifted directly from the cl code.
  3. A new macro, similar to the DEFUN macro, and probably called DEFUN_WITH_KEYWORDS, needs to be defined so that built-in Lisp primitives containing keywords can be created. Now, the DEFUN_WITH_KEYWORDS macro should take an additional parameter which is a string, which consists of the part of the lambda list declaration for this primitive that begins with the &amp;key specifier. This string is parsed in the DEFSUBR macro during XEmacs initialization, and is converted into the appropriate structure that needs to be stored into the subr object. In addition, the max_args parameter of the DEFUN macro needs to be incremented by the number of keyword parameters and these parameters are passed to the C function simply as extra parameters at the end. The DEFSUBR macro can sort out the actual number of required, optional and keyword parameters that the function takes, once it has parsed the keyword parameter string. (An alternative that might make the declaration of a primitive a little bit easier to understand would involve adding another parameter to the DEFUN_WITH_KEYWORDS macro that specifies the number of keyword parameters. However, this would require some additional complexity in the preprocessor definition of the DEFUN_WITH_KEYWORDS macro, and probably isn’t worth implementing).
  4. The byte compiler would have to be modified slightly so that it knows about keyword parameters when it parses the parameter declaration of a function. For example, so that it issues the correct warnings concerning calls to that function with incorrect arguments.
  5. The make-docfile program would have to be modified so that it generates the correct parameter lists for primitives defined using the DEFUN_WITH_KEYWORDS macro.
  6. Possibly other aspects of the help system that deal with function descriptions might have to be modified.
  7. A helper function might need to be defined to make it easier for primitives that use both the &amp;rest and &amp;key specifiers to parse their argument lists.

Internal API for C primitives with keywords - necessary for many of the new Mule APIs being defined.

 
  DEFUN_WITH_KEYWORDS (Ffoo, "foo", 2, 5, 6, ALLOW_OTHER_KEYWORDS,
      (ichi, ARG_NIL), (ni, ARG_NIL), (san, ARG_UNBOUND), 0,
      (arg1, arg2, arg3, arg4, arg5)
      )
  {
    ...
  }
  
  -> C fun of 12 args:
  
  (arg1, ... arg5, ichi, ..., roku, other keywords)
  
  Circled in blue is actual example declaration
  
  DEFUN_WITH_KEYWORDS (Ffoo, "foo", 1,2,0 (bar, baz) <- arg list
  [ MIN ARGS, MAX ARGS, something that could be REST, SPECIFY_DEFAULT or
  REST_SPEC]
  
  [#KEYWORDS [ ALLOW_OTHER, SPECIFY_DEFAULT, ALLOW_OTHER_SPECIFY_DEFAULT
  6, ALLOW_OTHER_SPECIFY_DEFAULT,
  
  (ichi, 0) (ni, 0), (san, DEFAULT_UNBOUND), (shi, "t"), (go, "5"),
  (roku, "(current-buffer)")
  <- specifies arguments, default values (string to be read into Lisp
     data during init; then forms evalled at fn ref time.
  
  ,0 <- [INTERACTIVE SPEC] )
  
  LO = Lisp_Object
  
  -> LO Ffoo (LO bar, LO baz, LO ichi, LO ni, LO san, LO shi, LO go,
              LO roku, int numkeywords, LO *other_keywords)
  
  #define DEFUN_WITH_KEYWORDS (fun, funstr, minargs, maxargs, argspec, \
           #args, num_keywords, keywordspec, keywords, intspec) \
  LO fun (DWK_ARGS (maxargs, args) \
          DWK_KEYWORDS (num_keywords, keywordspec, keywords))
  
  #define DWK_KEYWORDS (num_keywords, keywordspec, keywords) \
          DWK_KEYWORDS ## keywordspec (keywords)
          DWK_OTHER_KEYWORDS ## keywordspec)
  
  #define DWK_KEYWORDS_ALLOW_OTHER (x,y)
          DWK_KEYWORDS (x,y)
  
  #define DWK_KEYWORDS_ALLOW_OTHER_SPECIFICATIONS (x,y)
          DWK_KEYWORDS_SPECIFY_DEFAULT (x,y)
  
  #define DWK_KEYWORDS_SPECIFY_DEFAULT (numkey, key)
          ARGLIST_CAR ## numkey key
  
  #define ARGLT_GRZ (x,y) LO CAR x, LO CAR y

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.7 Future Work – Property Interface Changes

Author: Ben Wing

In my past work on XEmacs, I already expanded the standard property functions of get, put, and remprop to work on objects other than symbols and defined an additional function object-plist for this interface. I’d like to expand this interface further and advertise it as the standard way to make property changes in objects, especially the new objects that are going to be defined in order to support the added user interface features of version 22. My proposed changes are as follows:

  1. A new concept associated with each property called a default value is introduced. (This concept already exists, but not in a well-defined way.) The default value is the value that the property assumes for certain value retrieval functions such as get when it is unbound, which is to say that its value has not been explicitly specified. Note: the way to make a property unbound is to call remprop. Note also that for some built-in properties, setting the property to its default value is equivalent to making it unbound.
  2. The behavior of the get function is modified. If the get function is called on a property that is unbound and the third, optional default argument is nil, then the default value of the property is returned. If the default argument is not nil, then whatever was specified as the value of this argument is returned. For the most part, this is upwardly compatible with the existing definition of get because all user-defined properties have an initial default value of nil. Code that calls the get function and specifies nil for the default argument, and expects to get nil returned if the property is unbound, is almost certainly wrong anyway.
  3. A new function, get1 is defined. This function does not take a default argument like the get function. Instead, if the property is unbound, an error is signaled. Note: get can be implemented in terms of get1.
  4. New functions property-default-value and property-bound-p are defined with the obvious semantics.
  5. An additional function property-built-in-p is defined which takes two arguments, the first one being a symbol naming an object type, and the second one specifying a property, and indicates whether the property name has a built-in meaning for objects of that type.
  6. It is not necessary, or even desirable, for all object types to allow user-defined properties. It is always possible to simulate user-defined properties for an object by using a weak hash table. Therefore, whether an object allows a user to define properties or not should depend on the meaning of the object. If an object does not allow user-defined properties, the put function should signal an error, such as undefined-property, when given any property other than those that are predefined.
  7. A function called user-defined-properties-allowed-p should be defined with the obvious semantics. (See the previous item.)
  8. Three more functions should be defined, called built-in-property-name-list, property-name-list, and user-defined-property-name-list.

Another idea:

 
(define-property-method
  predicate object-type
  predicate cons :(KEYWORD)  (all lists beginning with KEYWORD)

  :put putfun
  :get
  :remprop
  :object-props
  :clear-properties
  :map-properties

  e.g. (define-property-method 'hash-table
         :put #'(lambda (obj key value) (puthash key obj value)))

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.8 Future Work – Toolbars


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.8.1 Future Work – Easier Toolbar Customization

Author: Ben Wing

Abstract: One of XEmacs’ greatest strengths is its ability to be customized endlessly. Unfortunately, it is often too difficult to figure out how to do this. There has been some recent work like the Custom package, which helps in this regard, but I think there’s a lot more work that needs to be done. Here are some ideas (which certainly could use some more thought).

Although there is currently an edit-toolbar package, it is not well integrated with XEmacs, and in general it is much too hard to customize the way toolbars look. I would like to see an interface that works a bit like the way things work under Windows, where you can right-click on a toolbar to get a menu of options that allows you to change aspects of the toolbar. The general idea is that if you right-click on an item itself, you can do things to that item, whereas if you right-click on a blank part of a toolbar, you can change the properties of the toolbar. Some of the items on the right-click menu for a particular toolbar button should be specified by the button itself. Others should be standard. For example, there should be an Execute item which simply does what would happen if you left-click on a toolbar button. There should probably be a Delete item to get rid of the toolbar button and a Properties item, which brings up a property sheet that allows you to do things like change the icon and the command string that’s associated with the toolbar button.

The options to change the appearance of the toolbar itself should probably appear both on the context menu for specific buttons, and on the menu that appears when you click on a blank part of the toolbar. That way, if there isn’t a blank part of the toolbar, you can still change the toolbar appearance. As for what appears in these items, in Outlook Express, for example, there are three different menu items, one of which is called Buttons, which brings up, or pops up a window which allows you to edit the toolbar, which for us could pop up a new frame, which is running edit-toolbar.el. The second item is called Align, which contains a submenu that says Top, Bottom, Left, and Right, which will be just like setting the default toolbar position. The third one says Text Labels, which would just let you select whether there are captions or not. I think all three of these are useful and are easy to implement in XEmacs. These things also need to be integrated with custom so that a user can control whether these options apply to all sessions, and in such a case can save the settings out to an options file. edit-toolbar.el in particular needs to integrate with custom. Currently it has some sort of hokey stuff of its own, which it saves out to a .toolbar file. Another useful option to have, once we draw the captions dynamically rather than using pre-generated ones, would be the ability to change the font size of the captions. I’m sure that Kyle, for one, would appreciate this.

(This is incomplete.....)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.8.2 Future Work – Toolbar Interface Changes

Author: Ben Wing

I propose changing the way that toolbars are specified to make them more flexible.

  1. A new format for the vector that specifies a toolbar item is allowed. In this format, the first three items of the vector are required and are, respectively, a caption, a glyph list, and a callback. The glyph list and callback arguments are the same as in the current toolbar item specification, and the caption is a string specifying the caption text placed below the toolbar glyph. The caption text is required so that toolbar items can be identified for the purpose of retrieving and changing their property values. Putting the caption first also makes it easy to distinguish between the new and the old toolbar item vector formats. In the old format, the first item, the glyph list, is either a list or a symbol. In the new format, the first item is a string. In the new format, following the three required items, are optional keyword items specified using keywords in the same format as the menu item vector format. The keywords that should be predefined are: :help-echo, :context-menu, :drop-handlers, and :enabled-p. The :enabled-p and :help-echo keyword arguments are the same as the third and fourth items in the old toolbar item vector format. The :context-menu keyword is a list in standard menu format that specifies additional items that will appear when the context menu for the toolbar item is popped up. (Typically, this happens when the right mouse button is clicked on the toolbar item). The :drop-handlers keyword is for use by the new drag-n-drop interface (see Drag-n-Drop Interface Changes ), and is not normally specified or modified directly.
  2. Conceivably, there could also be keywords that are associated with a toolbar itself, rather than with a particular toolbar item. These keyword properties would be specified using keywords and arguments that occur before any toolbar item vectors, similarly to how things are done in menu specifications. Possible properties could include :captioned-p (whether the captions are visible under the toolbar), :glyphs-visible-p (whether the toolbar glyphs are visible), and :context-menu (additional items that will appear on the context menus for all toolbar items and additionally will appear on the context menu that is popped up when the right mouse button is clicked over a portion of the toolbar that does not have any toolbar buttons in it). The current standard practice with regards to such properties seems to be to have separate specifiers, such as left-toolbar-width, right-toolbar-width, left-toolbar-visible-p, right-toolbar-visible-p, etc. It could easily be argued that there should be no such toolbar specifiers and that all such properties should be part of the toolbar instantiator itself. In this scheme, the only separate specifiers that would exist for individual properties would be default values. There are a lot of reasons why an interface change like this makes sense. For example, currently when VM sets its toolbar, it also sets the toolbar width and similar properties. If you change which edge of the frame the VM toolbar occurs in, VM will also have to go and modify all of the position-specific toolbar specifiers for all of the other properties associated with a toolbar. It doesn’t really seem to make sense to me for the user to be specifying the width and visibility and such of specific toolbars that are attached to specific edges because the user should be free to move the toolbars around and expect that all of the toolbar properties automatically move with the toolbar. (It is also easy to imagine, for example, that a toolbar might not be attached to the edge of the frame at all, but might be floating somewhere on the user’s screen). With an interface where these properties are separate specifiers, this has to be done manually. Currently, having the various toolbar properties be inside of toolbar instantiators makes them difficult to modify, but this will be different with the API that I propose below.
  3. I propose an API for modifying toolbar and toolbar item properties, as well as making other changes to toolbar instantiators, such as inserting or deleting toolbar items. This API is based around the concept of a path. There are two kinds of paths here – toolbar paths and toolbar item paths. Each kind of path is an object (of type toolbar-path and toolbar-item-path, respectively) whose properties specify the location in a toolbar instantiator where changes to the instantiator can be made. A toolbar path, for example, would be created using the make-toolbar-path function, which takes a toolbar specifier (or optionally, a symbol, such as left, right, default, or nil, which refers to a particular toolbar), and optionally, parameters such as the locale and the tag set, which specify which actual instantiator inside of the toolbar specifier is to be modified. A toolbar item path is created similarly using a function called make-toolbar-item-path, which takes a toolbar specifier and a string naming the caption of the toolbar item to be modified, as well as, of course, optionally the locale and tag set parameters and such.

    The usefulness of these path objects is as arguments to functions that will use them as pointers to the place in a toolbar instantiator where the modification should be made. Recall, for example, the generalized property interface described above. If a function such as get or put is called on a toolbar path or toolbar item path, it will use the information contained in the path object to retrieve or modify a property located at the end of the path. The toolbar path objects can also be passed to new functions that I propose defining, such as add-toolbar-item, delete-toolbar-item, and find-toolbar-item. These functions should be parallel to the functions for inserting, deleting, finding, etc. items in a menu. The toolbar item path objects can also be passed to the drop-handler functions defined in Drag-n-Drop Interface Changes to retrieve or modify the drop handlers that are associated with a toolbar item. (The idea here is that you can drag an object and drop it onto a toolbar item, just as you could onto a buffer, an extent, a menu item, or any other graphical element).

  4. We should at least think about allowing for separate default and buffer-local toolbars. The user should either be able to position these toolbars one above the other, or side by side, occupying a single toolbar line. In the latter case, the boundary between the toolbars should be draggable, and if a toolbar takes up more room than is allocated for it, there should be arrows that appear on one or both sides of the toolbar so that the items in the toolbar can be scrolled left or right. (For that matter, this sort of interface should exist even when there is only one toolbar that is on a particular toolbar line, because the toolbar may very well have more items than can be displayed at once, and it’s silly in such a case if it’s impossible to access the items that are not currently visible).
  5. The default context menu for toolbars (which should be specified using a specifier called default-toolbar-context-menu according to the rules defined above) should contain entries allowing the user to modify the appearance of a toolbar. Entries would include, for example, whether the toolbar is captioned, whether the glyphs for the toolbar are visible (if the toolbar is captioned but its glyphs are not visible, the toolbar appears as nothing but text; you can set things up this way, for example, in Netscape), an option that brings up a package for editing the contents of a toolbar, an option to allow the caption face to be dchanged (perhaps thorough jan edit-faces or custom interface), etc.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.9 Future Work – Menu API Changes

Author: Ben Wing

  1. I propose making a specifier for the menubar associated with the frame. The specifier should be called default-menubar and should replace the existing current-menubar variable. This would increase the power of the menubar interface and bring it in line with the toolbar interface. (In order to provide proper backward compatibility, we might have to complete the symbol value handler mechanism)
  2. I propose an API for modifying menu instantiators similar to the API composed above for toolbar instantiators. A new object called a menu path (of type menu-path) can be created using the make-menu-path function, and specifies a location in a particular menu instantiator where changes can be made. The first argument to make-menu-path specifies which menu to modify and can be a specifier, a value such as nil (which means to modify the default menubar associated with the selected frame), or perhaps some other kind of specification referring to some other menu, such as the context menus invoked by the right mouse button. The second argument to make-menu-path, also required, is a list of zero or more strings that specifies the particular menu or menu item in the instantiator that is being referred to. The remaining arguments are optional and would be a locale, a tag set, etc. The menu path object can be passed to get, put or other standard property functions to access or modify particular properties of a menu or a menu item. It can also be passed to expanded versions of the existing functions such as find-menu-item, delete-menu-item, add-menu-button, etc. (It is really a shame that add-menu-item is an obsolete function because it is a much better name than add-menu-button). Finally, the menu path object can be passed to the drop-handler functions described in Drag-n-Drop Interface Changes to access or modify the drop handlers that are associated with a particular menu item.
  3. New keyword properties should be added to the menu item vector. These include :help-echo, :context-menu and :drop-handlers, with similar semantics to the corresponding keywords for toolbar items. (It may seem a bit strange at first to have a context menu associated with a particular menu item, but it is a user interface concept that exists both in Open Look and in Windows, and really makes a lot of sense if you give it a bit of thought). These properties may not actually be implemented at first, but at least the keywords for them should be defined.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.10 Future Work – Removal of Misc-User Event Type

Author: Ben Wing

Abstract: This page describes why the misc-user event type should be split up into a number of different event types, and how to do this.

The misc-user event should not exist as a single event type. It should be split up into a number of different event types: one for scrollbar events, one for menu events, and one or two for drag-n-drop events. Possibly there will be other event types created in the future. The reason for this is that the misc-user event was a bad design choice when I made it, and it has only gotten worse with Oliver’s attempts to add features to it to make it be used for drag-n-drop. I know that there was originally a separate drag-n-drop event type, and it was folded into the misc-user event type on my recommendation, but I have now realized the error of my ways. I had originally created a single event type in an attempt to prevent some Lisp programs from breaking because they might have a case statement over various event types, and would not be able to handle new event types appearing. I think now that these programs simply need to be written in a way to handle new event types appearing. It’s not very hard to do this. You just use predicates instead of doing a case statement over the event type. If we preserve the existing predicate called misc-user-event-p, and just make sure that it evaluates to true when given any user event type other than the standard simple ones, then most existing code will not break either when we split the event types up like this, or if we add any new event types in the future.

More specifically, the only clean way to design the misc-user event type would be to add a sub-type field to it, and then have the nature of all the other fields in the event type be dependent on this sub-type. But then in essence, we’d just be reimplementing the whole event-type scheme inside of misc-user events, which would be rather pointless.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.11 Future Work – Mouse Pointer


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.11.1 Future Work – Abstracted Mouse Pointer Interface

Author: Ben Wing

Abstract: We need to create a new image format that allows standard pointer shapes to be specified in a way that works on all Windows systems. I suggest that this be called pointer, which has one tag associated with it, named :data, and whose value is a string. The possible strings that can be specified here are predefined by XEmacs, and are guaranteed to work across all Windows systems. This means that we may need to provide our own definition for pointer shapes that are not standard on some systems. In particular, there are a lot more standard pointer shapes under X than under Windows, and most of these pointer shapes are fairly useful. There are also a few pointer shapes (I think the hand, for example) on Windows, but not on X. Converting the X pointer shapes to Windows should be easy because the definitions of the pointer shapes are simply XBM files, which we can read under Windows. Going the other way might be a little bit more difficult, but it should still not be that hard.

While we’re at it, we should change the image format currently called cursor-font to x-cursor-font, because it only works under X Windows. We also need to change the format called resource to be mswindows-resource. At least in the case of cursor-font, the old value should be maintained for compatibility as an obsolete alias. The resource format was added so recently that it’s possible that we can just change it.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.11.2 Future Work – Busy Pointer

Author: Ben Wing

Automatically make the mouse pointer switch to a busy shape (watch signal) when XEmacs has been "busy" for more than, e.g. 2 seconds. Define the busy time as the time since the last time that XEmacs was ready to receive input from the user. An implementation might be:

  1. Set up an asynchronous timeout, to signal after the busy time; these are triggered through a call to QUIT so they will be triggered even when the code is busy doing something.
  2. We already have an "emacs_is_blocking" flag when we are waiting for input. In the same place, when we are about to block and wait for input (regardless of whether input is already present), maybe call a hook, which in this case would remove the timer and put back the normal mouse shape. Then when we exit the blocking stage (we got some input), call another hook, which in this case will start the timer. Note that we don’t want these "blocking" hooks to be triggered just because of an accept-process-output or some similar thing that retrieves events, only to put them back onto a queue for later processing. Maybe we want some sort of flag that’s bound by those routines saying that we aren’t really waiting for input. Making that flag Lisp-accessible allows it to be set by similar sorts of Lisp routines (if there are any?) that loop retrieving events but defer them, or only drain the queue, or whatnot. #### Think about whether it would make some sense to try and be more clever in our determinations of what counts as "real waiting for user input", e.g. whether the event gets dispatched (unfortunately this occurs way too late, we want to know to remove the busy cursor before getting an event), maybe whether there are any events waiting to be processed or we’ll truly block, etc. (e.g. one possibility if there is input on the queue already when we "block" for input, don’t remove the busy- wait pointer, but trigger the removal of it when we dispatch a user event).

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.12 Future Work – Extents


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.12.1 Future Work – Everything should obey duplicable extents

Author: Ben Wing

A lot of functions don’t properly track duplicable extents. For example, the concat function does, but the format function does not, and extents in keymap prompts are not displayed either. All of the functions that generate strings or string-like entities should track the extents that are associated with the strings. Currently this is difficult because there is no general mechanism implemented for doing this. I propose such a general mechanism, which would not be hard to implement, and would be easy to use in other functions that build up strings.

The basic idea is that we create a C structure that is analogous to a Lisp string in that it contains string data and lists of extents for that data. Unlike standard Lisp strings, however, this structure (let’s call it lisp_string_struct) can be incrementally updated and its allocation is handled explicitly so that no garbage is generated. (This is important for example, in the event-handling code which would want to use this structure, but needs to not generate any garbage for efficiency reasons). Both the string data and the list of extents in this string are handled using dynarrs so that it is easy to incrementally update this structure. Functions should exist to create and destroy instances of lisp_string_struct to generate a Lisp string from a lisp_string_struct and vice-versa to append a sub-string of a Lisp string to a lisp_string_struct, to just append characters to a lisp_string_struct, etc. The only thing possibly tricky about implementing these functions is implementing the copying of extents from a Lisp string into a lisp_string_struct. However, there is already a function copy_string_extents() that does basically this exact thing, and it should be easy to create a modified version of this function.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.13 Future Work – Version Number and Development Tree Organization

Author: Ben Wing

Abstract: The purpose of this proposal is to present a coherent plan for how development branches in XEmacs are managed. This will cover such issues as stable versus experimental branches, creating new branches, synchronizing patches between branches, and how version numbers are assigned to branches.

A development branch is defined to be a linear series of releases of the XEmacs code base, each of which is derived from the previous one. When the XEmacs development tree is forked and two branches are created where there used to be one, the branch that is intended to be more stable and have fewer changes made to it is considered the one that inherits the parent branch, and the other branch is considered to have begun at the branching point. The less stable of the two branches will eventually be forked again, while this will not happen usually to the more stable of the two branches, and its development will eventually come to an end. This means that every branch has a definite ending point. For example, the 20.x branch began at the point when the released 19.13 code tree was split into a 19.x and a 20.x branch, and a 20.x branch will end when the last 20.x release (probably numbered 20.5 or 20.6) is released.

I think that there should always be three active development branches at any time. These branches can be designated the stable, the semi-stable, and the experimental branches. This situation has existed in the current code tree as soon as the 21.0 development branch was split. In this situation, the stable branch is the 20.x series. The semi-stable branch is the 21.0 release and the stability releases that follow. The experimental branch is the branch that was created as the result of the 21.0 development branch split. Typically, the stable branch has been released for a long period of time. The semi-stable branch has been released for a short period of time, or is about to be released, and the experimental branch has not yet been released, and will probably not be released for awhile. The conditions that should hold in all circumstances are:

  1. There should be three active branches.
  2. The experimental branch should never be in feature freeze.

The reason for the second condition is to ensure that active development can always proceed and is never throttled, as is happening currently at the end of the 21.0 release cycle. What this means is that as soon as the experimental branch is deemed to be stable enough to go into feature freeze:

  1. The current stable branch is made inactive and all further development on it ceases.
  2. The semi-stable branch, which by now should have been released for a fair amount of time, and should be fairly stable, gets renamed to the stable branch.
  3. The experimental branch is forked into two branches, one of which becomes the semi-stable branch, and the other, the experimental branch.

The stable branch is always in high resistance, which is to say that the only changes that can be made to the code are important bug fixes involving a small amount of code where it should be clear just by reading the code that no destabilizing code has been introduced. The semi-stable branch is in low resistance, which means that no major features can be added, but except right before a release fairly major code changes are allowed. Features can be added if they are sufficiently small, if they are deemed sufficiently critical due to severe problems that would exist if the features were not added (for example, replacement of the unexec mechanism with a portable solution would be a feature that could be added to the semi-stable branch provided that it did not involve an overly radical code re-architecture, because otherwise it might be impossible to build XEmacs on some architectures or with some compilers), or if the primary purpose of the new feature is to remedy an incompleteness in a recent architectural change that was not finished in a prior release due to lack of time (for example, abstracting the mouse pointer and list-of-colors interfaces, which were left out of 21.0). There is no feature resistance in place in the experimental branch, which allows full development to proceed at all times.

In general, both the stable and semi-stable branches will contain previous net releases. In addition, there will be beta releases in all three branches, and possibly development snapshots between the beta releases. It’s obviously necessary to have a good version numbering scheme in order to keep everything straight.

First of all, it needs to be immediately clear from the version number whether the release is a beta release or a net release. Steve has proposed getting rid of the beta version numbering system, which I think would be a big mistake. Furthermore, the net release version number and beta release version number should be kept separate, just as they are now, to make it completely clear where any particular release stands. There may be alternate ways of phrasing a beta release other than something like 21.0 beta 34, but in all such systems, the beta number needs to be zero for any release version. Three possible alternative systems, none of which I like very much, are:

  1. The beta number is simply an extra number in the regular version number. Then, for example, 21.0 beta 34 becomes 21.0.34. The problem is that the release version, which would simply be called 21.0, appears to be earlier than 21.0 beta 34.
  2. The beta releases appear as later revisions of earlier releases. Then, for example, 21.1 beta 34 becomes 21.0.34, and 21.0 beta 34 would have to become 21.-1.34. This has both the obvious ugliness of negative version numbers and the problem that it makes beta releases appear to be associated with their previous releases, when in fact they are more closely associated with the following release.
  3. Simply make the beta version number be negative. In this scheme, you’d start with something like -1000 as the first beta, and then 21.0 beta 34 would get renumbered to 21.0.-968. Obviously, this is a crazy and convoluted scheme as well, and we would be best to avoid it.

Currently, the between-beta snapshots are not numbered, but I think that they probably should be. If appropriate scripts are handled to automate beta release, it should be very easy to have a version number automatically updated whenever a snapshot is made. The number could be added either as a separate snapshot number, and you’d have 21.0 beta 34 pre 1, which becomes before 21.0 beta 34; or we could make the beta number be floating point, and then the same snapshot would have to be called 21.0 beta 33.1. The latter solution seems quite kludgey to me.

There also needs to be a clear way to distinguish, when a net release is made, which branch the release is a part of. Again, three solutions come to mind:

  1. The major version number reflects which development branch the release is in and the minor version number indicates how many releases have been made along this branch. In this scheme, 21.0 is always the first release of the 21 series development branch, and when this branch is split, the child branch that becomes the experimental branch gets version numbers starting with 22. This scheme is the simplest, and it’s the one I like best.
  2. We move to a three-part version number. In this scheme, the first two numbers indicate the branch, and the third number indicates the release along the branch. In this scheme, we have numbers like 21.0.1, which would be the second release in the 21.0 series branch, and 21.1.2, which would be the third release in the 21.1 series branch. The major version number then gets increased only very occasionally, and only when a sufficiently major architectural change has been made, particularly one that causes compatibility problems with code written for previous branches. I think schemes like this are unnecessary in most circumstances, because usually either the major version number ends up changing so often that the second number is always either zero or one, or the major version number never changes, and as such becomes useless. By the time the major version number would change, the product itself has changed so much that it often gets renamed. Furthermore, it is clear that the two version number scheme has been used throughout most of the history of Emacs, and recently we have been following the two number scheme also. If we introduced a third revision number, at this point it would both confuse existing code that assumed there were two numbers, and would look rather silly given that the major version number is so high and would probably remain at the same place for quite a long time.
  3. A third scheme that would attempt to cross the two schemes would keep the same concept of major version number as for the three number scheme, and would compress the second and third numbers of the three number scheme into one number by using increments of ten. For example, the current 21.x branch would have releases No. 21.0, 21.1, etc. The next branch would be No. 21.10, 21.11, etc. I don’t like this scheme very much because it seems rather kludgey, and also because it is not used in any other product as far as I know.
  4. Another scheme that would combine the second and third numbers in the three number scheme would be to have the releases in the current 21.x series be numbered 21.0, then 21.01, then 22.02, etc. The next series is 21.1, then 21.11, then 21.12, etc. This is similar to the way that version numbers are done for DOS in Windows. I also think that this scheme is fairly silly because, like the previous scheme, its only purpose is to avoid increasing the major version number very much. But given that we have already have a fairly large major version number, there doesn’t seem to be any particular problem with increasing this number by one every year or two. Some people will object that by doing this, it becomes impossible to tell when a change is so major that it causes a lot of code breakage, but past releases have not been accurate indicators of this. For example, 19.12 caused a lot of code breakage, but 20.0 caused less, and 21.0 caused less still. In the GNU Emacs world, there were byte code changes made between 19.28 and 19.29, but as far as I know, not between 19.29 and 20.0.

With three active development branches, synchronizing code changes between the branches is obviously somewhat of a problem. To make things easier, I propose a few general guidelines:

  1. Merging between different branches need not happen that often. It should not happen more often than necessary to avoid undue burden on the maintainer, but needs to be done at all defined checkpoints. These checkpoints need to be noted in all of the places that track changes along the branch, for example, in all of the change logs and in all of the CVS tags.
  2. Every code change that can be considered a self-contained unit, no matter how large or small, needs to have a change log entry, preferably a single change log entry associated with it. This is an absolute requirement. There should be no code changes without an associated change log entry. Otherwise, it is highly likely that patches will not be correctly synchronized across all versions, and will get lost. There is no need for change log entries to contain unnecessary detail though, and it is important that there be no more change log entries than necessary, which means that two or more change log entries associated with a single patch need to be grouped together if possible. This might imply that there should be one global change log instead of change logs in each directory, or at the very least, the number of separate change logs should be kept to a minimum.
  3. The patch that is associated with each change log entry needs to be kept around somewhere. The reason for this is that when synchronizing code from some branch to some earlier branch, it is necessary to go through each change log entry and decide whether a change is worthy to make it into a more stable branch. If so, the patch associated with this change needs to be individually applied to the earlier branch.
  4. All changes made in more stable branches get merged into less stable branches unless the change really is completely unnecessary in the less stable branch because it is superseded by some other change. This will probably mean more developers making changes to the semi-stable branch than to the experimental branch. This means that developers should strive to do their development in the most stable branch that they expect their code to go into. An alternative to this which is perhaps more workable is simply to insist that all developers make all patches based off of the experimental branch, and then later merge these patches down to the more stable branches as necessary. This means, however, that submitted patches should never be combinations of two or more unrelated changes. Whenever such patches are submitted, they should either be rejected (which should apply to anybody who should know better, which probably means everybody on the beta list and anybody else who is a regular contributor), or the maintainer or some other designated party needs to filter the combined patch into separate patches, one per logical change.
  5. The maintainer should keep all the patches around in some data base, and the patches should be given an identifier consisting of the author of the patch, the date the patch was submitted, and some other identifying characteristic, such as a number, in case there is more than one patch on the same date by the same author. The database should hopefully be correctly marked at all times with something indicating which branches the patch has been applied to, and this database should hopefully be publicly visible so that patch authors can determine whether their patches have been applied, and whether their patches have been received, so that patches do not get needlessly resubmitted.
  6. Global automatable changes such as textual renaming, reordering, and additions or deletions of parameters in function calls should still be allowed, even with multiple development branches. (Sometimes these are necessary for code cleanliness, and in the long run, they save a lot of time, even through they may cause some headaches in the short-term.) In general, when such changes are made, they should occur in a separate beta version that contains only such changes and no other patches, and the changes should be made in both the semi-stable and experimental branches at the same time. The description of the beta version should make it very clear that the beta is comprised of such changes. The reason for doing these things is to make it easier for people to diff between beta versions in order to figure out the changes that were made without the diff getting cluttered up by these code cleanliness changes that don’t change any actual behavior.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.14 Future Work – Improvements to the xemacs.org Website

Author: Ben Wing

The xemacs.org web site is the face that XEmacs presents to the outside world. In my opinion, its most important function is to present information about XEmacs in such a way that solicits new XEmacs users and co-contributors. Existing members of the XEmacs community can probably find out most of the information they want to know about XEmacs regardless of what shape the web site is in, or for that matter, perhaps even if the web site doesn’t exist at all. However, potential new users and co-contributors who go to the XEmacs web site and find it out of date and/or lacking the information that they need are likely to be turned away and may never return. For this reason, I think it’s extremely important that the web site be up-to-date, well-organized, and full of information that an inquisitive visitor is likely to want to know.

The current XEmacs web site needs a lot of work if it is to meet these standards. I don’t think it’s reasonable to expect one person to do all of this work and make continual updates as needed, especially given the dismal record that the XEmacs web site has had. The proper thing to do is to place the web site itself under CVS and allow many of the core members to remotely check files in and out. This way, for example, Steve could update the part of the site that contains the current release status of XEmacs. (Much of this could be done by a script that Steve executes when he sends out a beta release announcement which automatically HTML-izes the mail message and puts it in the appropriate place on the web site. There are programs that are specifically designed to convert email messages into HTML, for example mhonarc.) Meanwhile, the xemacs.org mailing list administrator (currently Jason Mastaler, I think) could maintain the part of the site that describes the various mailing lists and other addresses at xemacs.org. Someone like me (perhaps through a proxy typist) could maintain the part of the site that specifies the future directions that XEmacs is going in, etc., etc.

Here are some things that I think it’s very important to add to the web site.

  1. A page describing in detail how to get involved in the XEmacs development process, how to submit and where to submit various patches to the XEmacs core or associated packages, how to contact the maintainers and core developers of XEmacs and the maintainers of various packages, etc.
  2. A page describing exactly how to download, compile, and install XEmacs, and how to download and install the various binary distributions. This page should particularly cover in detail how exactly the package system works from an installation standpoint and how to correctly compile and install under Microsoft Windows and Cygwin. This latter section should cover what compilers are needed under Microsoft Windows and Cygwin, and how to get and install the Cygwin components that are needed.
  3. A page describing where to get the various ancillary libraries that can be linked with XEmacs, such as the JPEG, TIFF, PNG, X-Face, DBM, and other libraries. This page should also cover how to correctly compile it and install these libraries, including under Microsoft Windows (or at least it should contain pointers to where this information can be found). Also, it should describe anything that needs to be specified as an option to configure in order for XEmacs to link with and make use of these libraries or of Motif or CDE. Finally, this page should list which versions of the various libraries are required for use with the various different beta versions of XEmacs. (Remember, this can change from beta to beta, and someone needs to keep a watchful eye on this).
  4. Pointers to any other sites containing information on XEmacs. This would include, for example, Hrvoje’s XEmacs on Windows FAQ and my Architecting XEmacs web site. (Presumably, most of the information in this section will be temporary. Eventually, these pages should be integrated into the main XEmacs web site).
  5. A page listing the various sub-projects in the XEmacs development process and who is responsible for each of these sub-projects, for example development of the package system, administration of the mailing lists, maintenance of stable XEmacs versions, maintenance of the CVS web interface, etc. This page should also list all of the packages that are archived at xemacs.org and who is the maintainer or maintainers for each of these packages.

Other Places with an XEmacs Presence

We should try to keep an XEmacs presence in all of the major places on the web that are devoted to free software or to the "open source" community. This includes, for example, the open source web site at http://opensource.oreilly.com (I’m already in the process of contacting this site), the Freshmeat site at http://www.freshmeat.net, the various announcement news groups (for example, comp.os.linux.announce, and the Windows announcement news group) etc.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.15 Future Work – Keybindings


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.15.1 Future Work – Keybinding Schemes

Author: Ben Wing

Abstract: We need a standard mechanism that allows a different global key binding schemes to be defined. Ideally, this would be the keyboard action interface that I have proposed, however this would require a lot of work on the part of mode maintainers and other external Elisp packages and will not be rady in the short term. So I propose a very kludgy interface, along the lines of what is done in Viper currently. Perhaps we can rip that key munging code out of Viper and make a separate extension that implements a global key binding scheme munging feature. This way a key binding scheme could rearrange all the default keys and have all sorts of other code, which depends on the standard keys being in their default location, still work.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.15.2 Future Work – Better Support for Windows Style Key Bindings

Author: Ben Wing

Abstract: This page describes how we could create an XEmacs extension that modifies the global key bindings so that a Windows user would feel at home when using the keyboard in XEmacs. Some of these bindings don’t conflict with standard XEmacs keybindings and should be added by default, or at the very least under Windows, and probably under X Windows as well. Other key bindings would need to be implemented in a Windows compatibility extension which can be enabled and disabled on the fly, following the conventions outlined in Standard interface for enabling extensions Ideally, this should be implemented using the keyboard action interface but these wil not be available in the short term, so we will have to resort to some awful kludges, following the model of Michael Kifer’s Viper mode.

We really need to make XEmacs provide standard Windows key bindings as much as possible. Currently, for example, there are at least two packages that allow the user to make a selection using the shifted arrow keys, and neither package works all that well, or is maintained. There should be one well-written piece of code that does this, and it should be a standard part of XEmacs. In fact, it should be turned on by default under Windows, and probably under X as well. (As an aside here, one point of contention in how to implement this involves what happens if you select a region using the shifted arrow keys and then hit the regular arrow keys. Does the region remain selected or not? I think there should be a variable that controls which of these two behaviors you want. We can argue over what the default value of this variable should be. The standard Windows behavior here is to keep the region selected, but move the insertion point elsewhere, which is unfortunately impossible to implement in XEmacs.)

Some thought should be given to what to do about the standard Windows control and alt key bindings. Under NTEmacs, there is a variable that controls whether the alt key behaves like the Emacs meta key, or whether it is passed on to the menu as in standard Windows programs. We should surely implement this and put this option on the Options menu. Making Alt-f for example, invoke the File menu, is not all that disruptive in XEmacs, because the user can always type ESC f to get the meta key functionality. Making Control-x, for example, do Cut, is much, much more problematic, of course, but we should consider how to implement this anyway. One possibility would be to move all of the current Emacs control key bindings onto control-shift plus a key, and to make the simple control keys follow the Windows standard as much as possible. This would mean, for example, that we would have the following keybindings:
Control-x ==> Cut
Control-c ==> Copy
Control-v ==> Paste
Control-z ==> Undo
Control-f ==> Find
Control-a ==> Select All
Control-s ==> Save
Control-p ==> Print
Control-y ==> Redo
(this functionality is available in XEmacs with Kyle Jones’ redo.el package, but it should be better integrated)
Control-n ==> New
Control-o ==> Open
Control-w ==> Close Window

The changes described in the previous paragraph should be put into an extension named windows-keys.el (see Standard interface for enabling extensions) so that it can be enabled and disabled on the fly using a menu item and can be selected as the default for a particular user in their custom options file. Once this is implemented, the Windows installer should also be modified so that it brings up a dialog box that allows the user to make a selection of which key binding scheme they would prefer as the default, either the XEmacs standard bindings, Vi bindings (which would be Viper mode), Windows-style bindings, Brief, CodeWright, Visual C++, or whatever we manage to implement.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.15.3 Future Work – Misc Key Binding Ideas

Author: Ben Wing


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.16 Future Work – Byte Code Snippets

Author: Ben Wing


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.16.1 Future Work – Autodetection

There are various proposals contained here.

New Implementation of Autodetection Mechanism

Author: Ben Wing

The current auto detection mechanism in XEmacs Mule has many problems. For one thing, it is wrong too much of the time. Another problem, although easily fixed, is that priority lists are fixed rather than varying, depending on the particular locale; and finally, it doesn’t warn the user when it’s not sure of the encoding or when there’s a mistake made during decoding. In both of these situations the user should be presented with a list of likely encodings and given the choice, rather than simply proceeding anyway and giving a result that is likely to be wrong and may result in data corruption when the file is saved out again.

All coding systems are categorized according to their type. Currently this includes ISO2022, Big 5, Shift-JIS, UTF8 and a few others. In the future there will be many more types defined and this mechanism will be generalized so that it is easily extendable by the Lisp programmer.

In general, each coding system type defines a series of subtypes which are handled differently for the purpose of detection. For example, ISO 2022 defines many different subtypes such as 7 bit, 8 bit, locking shift, designating and so on. UCS2 may define subtypes such as normal and byte reversed.

The detection engine works conceptually by calling the detection methods of all of the defined coding system types in parallel on successive chunks of data (which may, for example, be 4K in size, but where the size makes no difference except for optimization purposes) and watching the results until either a definite answer is determined or the end of data is reached. The way the definite answer is determined will be defined below. The detection method of the coding system type is passed some data and a chunk of memory, which the method uses to store its current state (and which is maintained separately for each coding system type by the detection engine between successive calls to the coding system type’s detection method). Its return value should be an alist consisting of a list of all of the defined subtypes for that coding system type along with a level of likelihood and a list of additional properties indicating certain features detected in the data. The extra properties returned are defined entirely by the particular coding system type and are used only in the algorithm described below under “user control.” However, the levels of likelihood have a standard meaning as follows:

Level 4 means “near certainty” and typically indicates that a signature has been detected, usually at the beginning of the data, indicating that the data is encoded in this particular coding system type. An example of this would be the byte order mark at the beginning of UCS2 encoded data or the GZIP mark at the beginning of GZIP data.

Level 3 means “highly likely” and indicates that tell-tale signs have been discovered in the data that are characteristic of this particular coding system type. Examples of this might be ISO 2022 escape sequences or the current Unicode end of line markers at regular intervals.

Level 2 means “strongly statistically likely” indicating that statistical analysis concludes that there’s a high chance that this data is encoded according to this particular type. For example, this might mean that for UCS2 data, there is a high proportion of null bytes or other repeated bytes in the odd-numbered bytes of the data and a high variance in the even-numbered bytes of the data. For Shift-JIS, this might indicate that there were no illegal Shift-JIS sequences and a fairly high occurrence of common Shift-JIS characters.

Level 1 means “weak statistical likelihood” meaning that there is some indication that the data is encoded in this coding system type. In fact, there is a reasonable chance that it may be some other type as well. This means, for example, that no illegal sequences were encountered and at least some data was encountered that is purposely not in other coding system types. For Shift-JIS data, this might mean that some bytes in the range 128 to 159 were encountered in the data.

Level 0 means “neutral” which is to say that there’s either not enough data to make any decision or that the data could well be interpreted as this type (meaning no illegal sequences), but there is little or no indication of anything particular to this particular type.

Level -1 means “weakly unlikely” meaning that some data was encountered that could conceivably be part of the coding system type but is probably not. For example, successively long line-lengths or very rarely-encountered sequences.

Level -2 means “strongly unlikely” meaning that typically a number of illegal sequences were encountered.

The algorithm to determine when to stop and indicate that the data has been detected as a particular coding system uses a priority list, which is typically specified as part of the language environment determined from the current locale or the user’s choice. This priority list consists of a list of coding system subtypes, along with a minimum level required for positive detection and optionally additional properties that need to be present. Using the return values from all of the detection methods called, the detection engine looks through this priority list until it finds a positive match. In this priority list, along with each subtype is a particular coding system to return when the subtype is encountered. (For example, in a Japanese-language environment particular subtypes of ISO 2022 will be associated with the Japanese coding system version of those subtypes). It is perfectly legal and quite common in fact, to list the same subtype more than once in the priority list with successively lower requirements. Other facts that can be listed in the priority list for a subtype are “reject”, meaning that the data should never be detected as this subtype, or “ask”, meaning that if the data is detected to be this subtype, the user will be asked whether they actually mean this. This latter property could be used, for example, towards the bottom of the priority list.

In addition there is a global variable which specifies the minimum number of characters required before any positive match is reported. There may actually be more than one such variable for different sources of data, for example, detection of files versus detection of subprocess data.

Whenever a file is opened and detected to be a particular coding system, the subtype, the coding system and the associated level of likelihood will be prominently displayed either in the echo area or in a status box somewhere.

If no positive match is found according to the priority list, or if the matches that are found have the “ask” property on them, then the user will be presented with a list of choices of possible encodings and asked to choose one. This list is typically sorted first by level of likelihood, and then within this, by the order in which the subtypes appear in the priority list. This list is displayed in a special kind of dialog box or other buffer allowing the user, in addition to just choosing a particular encoding, to view what the file would look like if it were decoded according to the type.

Furthermore, whenever a file is decoded according to a particular type, the decoding engine keeps track of status values that are output by the coding system type’s decoding method. Generally, this status will be in the form of errors or warnings of various levels, some of which may be severe enough to stop the decoding entirely, and some of which may either indicate definitely malformed data but from which it’s possible to recover, or simply data that appears rather questionable. If any of these status values are reported during decoding, the user will be informed of this and asked “are you sure?” As part of the “are you sure” dialog box or question, the user can display the results of the decoding to make sure it’s correct. If the user says “no, they’re not sure,” then the same list of choices as previously mentioned will be presented.

RFC: Autodetection

Also appeared under heading "Implementation of Coding System Priority Lists in Various Locales" ?

Author: Stephen Turnbull

Date: 11/1/1999 2:48 AM

 
>>>>> "Hrvoje" == Hrvoje Niksic <hniksic@srce.hr> writes:

    [Ben sez:]

    >> You are perfectly free to set up your XEmacs like this, but
    >> XEmacs/Mule will autodetect by default if there is no
    >> Content-Type: info and no reason to believe we are dealing with
    >> binary files.

    Hrvoje> In that case, it will be a serious mistake to make
    Hrvoje> --with-mule the default, ever.  I think more care should
    Hrvoje> be shown in meeting the need of European users.

Hrvoje, I don’t understand what you are worrying about. I suspect you are worrying about Handa’s hyperactive and obstinate Mule, not what Ben has in mind. Yes, Ben has said "better guessing," but that’s simply not reasonable without substantial language environment information. I think trying to detect Latin-1 vs Latin-2 in the POSIX locale would be a big mistake, I think trying to guess Big 5 v. Shift JIS in a European locale would be a big mistake.

If Ben doesn’t mean "more appropriate use of language environment information" when he writes "better guessing," I, as much as you, want to see how he plans to do that. Ben? ("Yes/no/oops I need to think about it" is good enough if you have specifics you intend to put in the RFC you’re planning to present.)

Let me give a formal proposal of what I would like to see in the autodetection specification.

  1. Definitions
    1. Autodetection means detecting and making available to Mule the external file’s encoding. See (5), below. It doesn’t imply any specific actions based on that information.
    2. The default case is POSIX locale, and no environment information in ~/.emacs.

      N.B. This will cause breakage for all 1-byte users because the default case can no longer assume Latin-1. You may be able to use the TTY font or the Xt -font option to fake this, and default to iso8859-1; I would hope that we would not use such a kludge in the beta versions, although it might be satisfactory for general use. In particular, encodings like VISCII (Vietnamese) and I believe KOI-8 (Cyrillic) are not ISO-2022-clean, but using C1 control characters as a heuristic for detecting binary files is useful.

      If we do allow it, I think that XEmacs should bitch and warn that the practices of implicitly specifying language environment by -font and defaulting on TTYs is deprecated and likely to be obsoleted.

    3. The European case is any Latin-* locale, either implied by setlocale() and friends or set in ~/.emacs. Latin-1 is specifically not given precedence over other Latin-*, or non-Latin or non-ISO-8859 for that matter. I suspect but am not sure that this case extends to all ISO-8859 encodings, and possibly to non-ISO-8859 single-byte encodings like KOI-8r (in particular when combined in a class with ISO-8859 encodings).
    4. The CJK case is any CJK locale. Japanese is specifically not given precedence over other Asian locales.
    5. For completeness, define the Unicode case (Unicode unfortunately has lots of junk such as precomposed characters, language tags, and directionality indicators in it; we probably don’t care yet, but we should also not claim compliance) and the general case (which has a lot of features similar to Unicode, but lacks the advantage of a unified encoding). This proposal has no idea how to handle the special features of these, or even if that matters. The general case includes stuff that nobody here really knows how it works, like Tibetan and Ethiopic.

    Each of the following cases is given in the order of priority of detection. I’m not sure I’m serious about the top priority given the (optional) Unicode detection. This may be appropriate if Ben is right that ISO-2022 is going to disappear, but possibly not until then (two two-byte sequences out of 65536 is probably 1.99 too many). It probably isn’t too risky if (6)(c) is taken pretty seriously; a Unicode file should contain _no_ private use characters unless the encoding is explicitly specified, and that’s a block of 1/10 of the code space, which should help a lot in detecting binary files.

  2. Default locale
    1. Some Unicode (fixed width; maybe UTF-8, too?) may optionally be detected by the byte-order-mark magic (if the first two bytes are 0xFE 0xFF, the file is Unicode text, if 0xFF 0xFE, it is wrong-endian Unicode; if legal in UTF-8, it would be 0xFE 0xBB 0xBF, either-endian). This is probably an optimization that should not be on by default yet.
    2. ISO-2022 encodings will be detected as long as they use explicit designation of all non-ASCII character sets. This means that many 7-bit ISO-2022 encodings would be detected (eg, ISO-2022-JP), but EUC-JP and X Compound Text would not, because they implicitly designate character sets.

      N.B. Latin-1 will be detected as binary, as for any Latin-*.

      N.B. An explicit ISO-2022 designation is semantically equivalent to a Content-Type: header. It is more dangerous because shorter, but I think we should recognize them by default despite the slight risk; XEmacs is a text editor.

      N.B. This is unlikely to be as dangerous as it looks at first glance. Any file that includes an 8-bit-set byte before the first valid designation should be detected as binary.

    3. Binary files will be detected (eg, presence of NULs, other non-whitespace control characters, absurdly long lines, and presence of bytes >127).
    4. Everything else is ASCII.
    5. Newlines will be detected in text files.
  3. European locales
    1. Unicode may optionally be detected by the byte-order-mark magic.
    2. ISO-2022 encodings will be detected as long as they use explicit designation of all non-ASCII character sets.
    3. A locale-specific class of 1-byte character sets (eg, ’(Latin-1)) will be detected.

      N.B. The reason for permitting a class is for cases like Cyrillic where there are both ISO-8859 encodings and incompatible encodings (KOI-8r) in common use. If you want to write a Latin-1 v. Latin-2 detector, be my guest, but I don’t think it would be easy or accurate.

    4. Binary files will be detected per (2)(c), except that only 8-bit bytes out of the encoding’s range imply binary.
    5. Everything else is ASCII.
    6. Newlines will be detected in text files.
  4. CJK locales
    1. Unicode may optionally be detected by the byte-order-mark magic.
    2. ISO-2022 encodings will be detected as long as they use explicit designation of all non-ASCII character sets.
    3. A locale-specific class of multi-byte and wide-character encodings will be detected. N.B. No 1-byte character sets (eg, Latin-1) will be detected. The reason for a class is to allow the Japanese to let Mule do the work of choosing EUC v. SJIS.
    4. Binary files will be detected per (3)(d).
    5. Everything else is ASCII.
    6. Newlines will be detected in text files.
  5. Unicode and general locales; multilingual use
    1. Hopefully a system general enough to handle (2)–(4) will handle these, too, but we should watch out for gotchas like Unicode “plane 14” tags which (I think _both_ Ben and Olivier will agree) have no place in the internal representation, and thus must be treated as out-of-band control sequences. I don’t know if all such gotchas will be as easy to dispose of.
    2. An explicit coding system priority list will be provided to allow multilingual users to autodetect both Shift JIS and Big 5, say, but this ability is not promised by Mule, since it would involve (eg) heuristics like picking a set of code points that are frequent in Shift JIS and uncommon in Big 5 and betting that a file containing many characters from that set is Shift JIS.
  6. Relationship to decoding semantics
    1. Autodetection should be run on every input stream unless the user explicitly disables it.
    2. The (conceptual) default procedure is
    3. Read the file into the buffer

      Announce the result of autodetection to the user.

      User may request decoding, with autodetected encoding(s) given priority in a list of available encodings.

      zations (see (e) below) should avoid introducing data tion that this default procedure would avoid.

      sly, it can’t be perfect if any autodecoding is done; like Hrvoje should have an easily available option to to this default (or an optimized approximation which t actually read the whole file into a buffer) or simply y everything as binary (with the “font” for binary files a user option).

    4. This implies that we should be detecting conditions in the tail of the file which violate the implicit assumptions of the coding system autodetected (eg, in UTF-8 illegal UTF-8 sequences, including those corresponding to surrogates) should raise a warning; the buffer should probably be made read-only and the user prompted.

      This could be taken to extremes, like checking by table whether all characters in a Japanese file are actually legitimate JIS codes; that’s insane (and would cause corporate encodings to be recognized as binary). But we should think about the idea that autodetection shouldn’t mean XEmacs can’t change its mind.

    5. A flexible means for the user to delegate the decision (conditional on the result of autodetection) to decode or not to XEmacs or a Lisp program should be provided (eg, the coding priority list and/or a file-coding-alist).
    6. Optimized operations (eg, the current lstreams) should be provided, with the recognition that if they depend on sampling the file they are risky.
    7. Mule should provide a reasonable set of default delegations (as in (d) above) for as many locales as possible.
  7. Implementation
    1. I think all the decision logic suggested above can be accomplished through a coding-priority-list and appropriate initializations for different language environments, and a file-coding-alist.
    2. Many of the tests on the file’s tail shouldn’t be very expensive; in particular, all of the ones I’ve suggested are O(n) although they might involve moderate-sized auxiliary tables for efficiency (eg, 64kB for a single Unicode-oriented test).

Other comments:

It might be reasonable given Hrvoje’s objections to require that any autodetection that could cause data loss (any coding system that involves escape sequences, and only those AFAIK: by design translation to Unicode is invertible) by default prompt the user (presumable with a novice-like ability to retain the prompt, always default to binary, or always default to the autodetected encoding) in the future, at least in locales that don’t need it (POSIX, Latin-any).

Ben thinks that we can remember the input data; I think it’s going to be hard to comprehensively test that a highly optimized version works. Good design will help, but ISO-2022 is enormously complex, and there are many encodings that violate even its lax assumptions. On the other hand, memory is the only way to get non-rewindable streams right.

Hrvoje himself said he would like to have an XEmacs that distinguishes between Latin-1 and Latin-2 text. Where it is possible to do that, this is exactly what autodetection of ISO-2022 and Unicode gives you. Many people would want that, even at some risk of binary corruption.

>> Once again I remind you that XEmacs is a text editor. There >> are lots of files that potentially may have Japanese etc. in >> them without this marked, e.g. C or Elisp files in the XEmacs >> source. Surely you’re not arguing that we interpret even these >> files as binary by default?

Hrvoje> I am. If I want to see Japanese, I’ll setup my Hrvoje> environment that way. But I don’t, and neither do 99% of Hrvoje> Croatian users. I can’t speak for French, Italian, and Hrvoje> others, but I’d assume similar.

Hrvoje> If there is Japanese in the source files, I will see it as Hrvoje> escape sequences, which is perfectly fine, because I don’t Hrvoje> read Japanese.

And some (European) people will have their terminals scrambled, because Shift-JIS contains sequences that can change the state of XTerm (as do fixed-width Unicode and Big5). This may also be a problem with some Windows-12xx encodings; I’m not sure they all are ISO-2022-clean. (This isn’t a problem for XEmacs native X11 frames or native MS-Windows frames, and the XEmacs sources themselves are all in 7-bit ISO-2022 now IIRC. But it is a potential source of great frustration for many users.)

I think that should be considered too, although it is presumably lower priority than the data corruption of binary files.

Response to RFC: Autodetection

Author: Ben Wing

Date: 11/1/1999 7:24 AM

Stephen, thank you very much for writing this up. I think it is a good start, and definitely moving in the direction I would like to see things going: more proposals, less arguing. (aka “more light, less heat”) However, I have some suggestions for cleaning this up:

You should try to make it more layered. For example, you might have one section devoted to the workings of autodetection, which starts out like this (the section numbers below are totally arbitrary):

Section 5

Autodetect() is a function whose arguments are (1) a readable stream, (2) some hints indicating how the autodetection is to proceed, and (3) a value indicating the maximum number of characters to examine at the beginning of the stream. (Possibly, the value in (3) may be some special symbol indicating that we only go as far as the next line, or a certain number of lines ahead; this would be used as part of "continuous autodetection", e.g. we are decoding the results of an interactive terminal session, where the user may periodically switch encodings, line terminations, etc. as different programs get run and/or telnet or similar sessions are entered into and exited.) We assume the stream is rewindable; if not, insert a "rewinding" stream in front of the non-rewinding stream; this kind of stream automatically buffers the data as necessary. [You can use pseudo-code terminology here. No need for straight C or ELisp.] [Then proceed to describe what the hints look like – e.g. you could portray it as a property list or whatever. The idea is that, for each locale, there is a corresponding hints value that is used at least by default. The hints structure also has to be set up to allow for two or more competing hints specifications to be merged together. For example, the extension of a file might provide an additional hint or hints about how to interpret the data of that file, and the caller of autodetect(), when calling autodetect() on such a file, would need to have a way of gracefully merging the default hints corresponding to the locale with the more specific hints provided by the extension. Furthermore, users like Hrvoje might well want to provide their own hints to supplement and override parts of the generic hints – e.g. "I don’t ever want to see non-European encodings decoded; treat them as binary instead".] [Then describe algorithmically how the autodetection works. First, you could describe it more generally, i.e. presenting an algorithmic overview, then you could discuss in detail exactly how autodetection of a particular type of external encoding works – e.g. "for iso2022, we first look for an escape character, followed by a byte in this range [. ... .] etc."]

Section 6

This section describes the concept of a locale in XEmacs, and how it is derived from the user’s environment. A locale in XEmacs is a pair, a country and a language, together determining the handling of locale-specific areas of XEmacs. All locale-specific areas in XEmacs make use of this XEmacs locale, and do not attempt to derive the locale from any other sources. The user is free to change the current locale at any time; accessor and mutator functions are provided to do this so that various locale-specific areas can optionally be changed together with it.

[Then you describe how the XEmacs locale is extracted from .emacs, from setlocale(), from the LANG environment variables, from -font, or wherever else. All other sections assume this dirty work is done and never even mention it]

Section 7

[Here you describe the default autodetect() hints value corresponding to each possible locale. You should probably use a schematic description here, e.g. an actual Lisp property list, liberally commented.]

Section 8 etc.

[Other sections cover anything I’ve missed. By being very careful to separate out the layers, you simultaneously introduce more rigor (easier to catch bugs) and make it easier for someone else to understand it completely.]

Better Algorithm, More Flexibility, Different Levels of Certainty

Much More Flexible Coding System Priority List, per-Language Environment

User Ability to Select Encoding when System Unsure or Encounters Errors

Another Autodetection Proposal

Author: Ben Wing

however, in general the detection code has major problems and needs lots of work:

ben [at least that’s what sjt thinks]

*****

Author: Stephen Turnbull

While this is clearly something of an improvement over earlier designs, it doesn’t deal with the most important issue: to do better than categories (which in the medium term is mostly going to mean "which flavor of Unicode is this?"), we need to look at statistical behavior rather than ruling out categories via presence of specific sequences. This means the stream processor should

  1. keep octet distributions (octet, 2-, 3-, 4- octet sequences)
  2. in some kind of compressed form
  3. look for "skip features" (eg, characteristic behavior of leading bytes for UTF-7, UTF-8, UTF-16, Mule code)
  4. pick up certain "simple" regexps
  5. provide "triggers" to determine when statistical detectors should be invoked, such as octet count
  6. and "magic" like Unicode signatures or file(1) magic.

–sjt


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.16.2 Future Work – Conversion Error Detection

"No Corruption" Scheme for Preserving External Encoding when Non-Invertible Transformation Applied

Author: Ben Wing

A preliminary and simple implementation is:

But you could implement it much more simply and usefully by just determining, for any text being decoded into mule-internal, can we go back and read the source again? If not, remember the entire file (GNUS message, etc) in text properties. Then, implement the UI interface (like Netscape’s) on top of that. This way, you have something that at least works, but it might be inefficient. All we would need to do is work on making the underlying implementation more efficient.

A more detailed proposal for avoiding binary file corruption is

Basic idea: A coding system is a filter converting an entire input stream into an output stream. The resulting stream can be said to be "correspondent to" the input stream. Similarly, smaller units can correspond. These could potentially include zero width intervals on either side, but we avoid this. Specifically, the coding system works like:

 
loop (input) {

 Read bytes till we have enough to generate a translated character or a chars.

 This establishes a "correspondence" between the whole input and
 output more or less in minimal chunks.

}

We then do the following processing:

  1. Eliminate correspondences where one or the other of the I/O streams has a zero interval by combining with an adjacent interval;
  2. Group together all adjacent "identity" correspondences into as large groups as possible;
  3. Use text properties to store the non-identity correspondences on the characters. For identity correspondences, use a simple text property on all that contains no data but just indicates that the whole string of text is identity corresponded. (How do we define "identity"? Latin 1 or could it be something else? For example, Latin 2)?
  4. Figure out the procedures when text is inserted/deleted and copied or pasted.
  5. Figure out to save the file out making use of the correspondences. Allow ways of saving without correspondences, and doing a "save to buffer with and without correspondences." Need to be clever when dealing with modal coding systems to parse the correspondences to get the internal state right.

Another Error-Catching Idea

Author: Ben Wing

Nov 4, 1999

Finally, I don’t think "save the input" is as hard as you make it out to be. Conceptually, in fact, it’s simple: for each minimal group of bytes where you cannot absolutely guarantee that an external->internal transformation is reversible, you put a text property on the corresponding internal character indicating the bytes that generated this character. We also put a text property on every character, indicating the coding system that caused the transformation. This latter text property is extremely efficient (e.g. in a buffer with no data pasted from elsewhere, it will map to a single extent over all the buffer), and the former cases should not be prevalent enough to cause a lot of inefficiency, esp. if we define what "reversible" means for each coding system in such a way that it correctly handles the most common cases. The hardest part, in fact, is making all the string/text handling in XEmacs be robust w.r.t. text properties.

Strategies for Error Annotation and Coding Orthogonalization

Author: Stephen Turnbull

We really want to separate out a number of things. Conceptually, there is a nested syntax.

At the top level is the ISO 2022 extension syntax, including charset designation and invocation, and certain auxiliary controls such as the ISO 6429 direction specification. These are octet-oriented, with the single exception (AFAIK) of the "exit Unicode" sequence which uses the UTF’s natural width (1 byte for UTF-7 and UTF-8, 2 bytes for UCS-2 and UTF-16, and 4 bytes for UCS-4 and UTF-32). This will be treated as a (deprecated) special case in Unicode processing.

The middle layer is ISO 2022 character interpretation. This will depend on the current state of the ISO 2022 registers, and assembles octets into the character’s internal representation.

The lowest level is translating system control conventions. At present this is restricted to newline translation, but one could imagine doing tab conversion or line wrapping here. "Escape from Unicode" processing would be done at this level.

At each level the parser will verify the syntax. In the case of a syntax error or warning (such as a redundant escape sequence that affects no characters), the parser will take some action, typically inserting the erroneous octets directly into the output and creating an annotation which can be used by higher level I/O to mark the affected region.

This should make it possible to do something sensible about separating newline convention processing from character construction, and about preventing ISO 2022 escape sequences from being recognized inappropriately.

The basic strategy will be to have octet classification tables, and switch processing according to the table entry.

It’s possible that, by doing the processing with tables of functions or the like, the parser can be used for both detection and translation.

Handling Writing a File Safely, Without Data Loss

Author: Ben Wing

When writing a file, we need error detection; otherwise somebody will create a Unicode file without realizing the coding system of the buffer is Raw, and then lose all the non-ASCII/Latin-1 text when it’s written out. We need two levels

  1. first, a "safe-charset" level that checks before any actual encoding to see if all characters in the document can safely be represented using the given coding system. FSF has a "safe-charset" property of coding systems, but it’s stupid because this information can be automatically derived from the coding system, at least the vast majority of the time. What we need is some sort of alternative-coding-system-precedence-list, langenv-specific, where everything on it can be checked for safe charsets and then the user given a list of possibilities. When the user does "save with specified encoding", they should see the same precedence list. Again like with other precedence lists, there’s also a global one, and presumably all coding systems not on other list get appended to the end (and perhaps not checked at all when doing safe-checking?). safe-checking should work something like this: compile a list of all charsets used in the buffer, along with a count of chars used. that way, "slightly unsafe" coding systems can perhaps be presented at the end, which will lose only a few characters and are perhaps what the users were looking for.

    [sjt sez this whole step is a crock. If a universal coding system is unacceptable, the user had better know what he/she is doing, and explicitly specify a lossy encoding. In principle, we can simply check for characters being writable as we go along. Eg, via an "unrepresentable character handler." We still have the buffer contents. If we can’t successfully save, then ask the user what to do. (Do we ever simply destroy previous file version before completing a write?)]

  2. when actually writing out, we need error checking in case an individual char in a charset can’t be written even though the charsets are safe. again, the user gets the choice of other reasonable coding systems.

    [sjt – something is very confused, here; safe charsets should be defined as those charsets all of whose characters can be encoded.]

  3. same thing (error checking, list of alternatives, etc.) needs to happen when reading! all of this will be a lot of work!

Author: Stephen Turnbull

I don’t much like Ben’s scheme. First, this isn’t an issue of I/O, it’s a coding issue. It can happen in many places, not just on stream I/O. Error checking should take place on all translations. Second, the two-pass algorithm should be avoided if possible. In some cases (eg, output to a tty) we won’t be able to go back and change the previously output data. Third, the whole idea of having a buffer full of arbitrary characters which we’re going to somehow shoehorn into a file based on some twit user’s less than informed idea of a coding system is kind of laughable from the start. If we’re going to say that a buffer has a coding system, shouldn’t we enforce restrictions on what you can put into it? Fourth, what’s the point of having safe charsets if some of the characters in them are unsafe? Fifth, what makes you think we’re going to have a list of charsets? It seems to me that there might be reasons to have user-defined charsets (eg, "German" vs "French" subsets of ISO 8859/15). Sixth, the idea of having language environment determine precedence doesn’t seem very useful to me. Users who are working with a language that corresponds to the language environment are not going to run into safe charsets problems. It’s users who are outside of their usual language environment who run into trouble. Also, the reason for specifying anything other than a universal coding system is normally restrictions imposed by other users or applications. Seventh, the statistical feedback isn’t terribly useful. Users rarely "want" a coding system, they want their file saved in a useful way. We could add a FORCE argument to conversions for those who really want a specific coding system. But mostly, a user might want to edit out a few unsafe characters. So (up to some maximum) we should keep a list of unsafe text positions, and provide a convenient function for traversing them.

–sjt


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.16.3 Future Work – Unicode

Author: Ben Wing

Following is an old proposal. Unicode has been implemented already, in a different fashion; but there are some ideas here for more general support, e.g. properties of Unicode characters other than their mappings to particular charsets.

We recognize 128, [256], 128x128, [256x256] for source charsets;

for Unicode, 256x256 or 16x256x256.

In all cases, use tables of tables and substitute a default subtable if entire row is empty.

If destination is Unicode, either 16 or 32 bits.

If destination is charset, either 8 or 16 bits.

For the moment, since we only do 94, 96, 94x94 or 96x96, only do 128 or 128x128 for source charsets and use the range 33-126 or 32-127. (Except ASCII - we special case that and have no table because we can algorithmically translate)

Also have a 16x256x256 table -> 32 bits of Unicode char properties.

A particular charset contains two associated mapping tables, for both directions.

API is set-unicode-mapping:

 
(set-unicode-mapping
     unicode char
     unicode charset-code charset-offset
     unicode vector of char
     unicode list of char
     unicode string of char
     unicode vector or list of codes charset-offset

Establishes a mapping between a unicode codepoint (a fixnum) and one or more chars in a charset. The mapping is automatically established in both directions. Chars in a charset can be specified either with an actual character or a codepoint (i.e. an fixnum) and the charset it’s within. If a sequence of chars or charset points is given, multiple mappings are established for consecutive unicode codepoints starting with the given one. Charset codepoints are specified as most-significant x 256 + least significant, with both bytes in the range 33-126 (for 94 or 94x94) or 32-127 (for 96 or 96x96), unless an offset is given, which will be subtracted from each byte. (Most common values are 128, for codepoints given with the high bit set, or -32, for codepoints given as 1-94 or 0-95.)

Other APIs:

 
(write-unicode-mapping file charset)

Write the mapping table for a particular charset to the specified file. The tables are written in an internal format that allows for efficient loading, for portability across platforms and XEmacs invocations, for conserving space, for appending multiple tables one directly after another with no need for a directory anywhere in the file, and for reorganizing a file as in this format (with a magic sequence at the beginning). The data will be appended at the end of a file, so that multiple tables can be written to a file; remove the file first to avoid this.

 
(write-unicode-properties file unicode-codepoint length)

Write the Unicode properties (not including charset mappings) for the specified range of contiguous Unicode codepoints to the end of the file (i.e. append mode) in a binary format similar to what was mentioned in the write-unicode-mapping description and with the same features.

Extension to set-unicode-mapping:

 
(set-unicode-mapping
  list-or-vector-of-unicode-codepoints char
  ""                                   charset-code charset-offset
  ""                                   sequence of char
  ""                                   list-or-vector-of-codes
charset-offset

The first two forms are conceptually the inverse of the forms above to specify characters for a contiguous range of Unicode codepoints. These new forms let you specify the Unicode codepoints for a contiguous range of chars in a charset. "Contiguous" here means that if we run off the end of a row, we go to the first entry of the next row, rather than to an invalid code point. For example, in a 94x94 charset, valid rows and columns are in the range 0x21-0x7e; after 0x457c 0x457d 4x457e goes 0x4621, not something like 0x457f, which is invalid.

The final two forms are the most general, letting you specify an arbitrary set of both Unicode points and charset chars, and the two are matched up just like a series of individual calls. However, if the lists or vectors do not have the same length, an error is signaled.

 
(load-unicode-mapping file &optional charset)

If charset is omitted, loads all charset mapping tables found and returns a list of the charsets found. If charset is specified, searches through the file for the appropriate mapping tables. (This is extremely fast because each entry in the file gives an offset to the next one). Returns t if found.

 
(load-unicode-properties file unicode-codepoint)
 
(list-unicode-entries file)
 
(autoload-unicode-mapping charset)

...

(unfinished)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.16.4 Future Work – BIDI Support

Author: Ben Wing

  1. Use text properties to handle nesting levels, overrides BIDI-specific text properties (as per Unicode BIDI algorithm) computed at text insertion time.
  2. Lisp API for reordering a display line at redisplay time, possibly substitution of different glyphs (esp. mirroring of glyphs).
  3. Lisp API called after a display line is laid out, but only when reordering may be necessary (display engine checks for non-uniform BIDI text properties; can handle internally a line that’s completely in one direction)
  4. Default direction is a buffer-local variable
  5. We concentrate on implementing Unicode BIDI algorithm.
  6. Display support for mirroring of entire window
  7. Display code keeps track of mirroring junctures so it can display double cursor.
  8. Entire layout of screen (on a per window basis) is exported as a Lisp API, for visual editing (also very useful for other purposes e.g. proper handling of word wrapping with proportional fonts, complex Lisp layout engines e.g. W3)
  9. Logical, visual, etc. cursor movement handled entirely in Lisp, using aforementioned API, plus a specifier for controlling how cursor is shown (e.g. split or not).

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.16.5 Future Work – Localized Text/Messages

NOTE: There is existing message translation in X Windows of menu names. This is handled through X resources. The files are in ‘PACKAGES/mule-packages/locale/app-defaults/LOCALE/Emacs’, where locale is ‘ja’, ‘fr’, etc.

See lib-src/make-msgfile.lex.

Long comment from jwz, some additions from ben marked "ben":

(much of this comment is outdated, and a lot of it is actually implemented)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.16.6 Proposal for How This All Ought to Work

Author: Jamie Zawinski

this isn’t implemented yet, but this is the plan-in-progress

In general, it’s accepted that the best way to internationalize is for all messages to be referred to by a symbolic name (or number) and come out of a table or tables, which are easy to change.

However, with Emacs, we’ve got the task of internationalizing a huge body of existing code, which already contains messages internally.

For the C code we’ve got two options:

In this case, it’s desirable to make as few changes as possible to the C code, to make it easier to merge the code with the FSF version of emacs which won’t ever have these changes made to it. So we should go with the former option.

The way it has been done (between 19.8 and 19.9) was to use gettext(), but also to make massive changes to the source code. The goal now is to use gettext() at run-time and yet not require a textual change to every line in the C code which contains a string constant. A possible way to do this is described below.

(gettext() can be implemented in terms of catgets() for non-Sun systems, so that in itself isn’t a problem.)

For the Lisp code, we’ve got basically the same options: put everything in a table, or translate things implicitly.

Another kink that lisp code introduces is that there are thousands of third- party packages, so changing the source for all of those is simply not an option.

Is it a goal that if some third party package displays a message which is one we know how to translate, then we translate it? I think this is a worthy goal. It remains to be seen how well it will work in practice.

So, we should endeavor to minimize the impact on the lisp code. Certain primitive lisp routines (the stuff in lisp/prim/, and especially in ‘cmdloop.el’ and ‘minibuf.el’) may need to be changed to know about translation, but that’s an ideologically clean thing to do because those are considered a part of the emacs substrate.

However, if we find ourselves wanting to make changes to, say, RMAIL, then something has gone wrong. (Except to do things like remove assumptions about the order of words within a sentence, or how pluralization works.)

There are two parts to the task of displaying translated strings to the user: the first is to extract the strings which need to be translated from the sources; and the second is to make some call which will translate those strings before they are presented to the user.

The old way was to use the same form to do both, that is, GETTEXT() was both the tag that we searched for to build a catalog, and was the form which did the translation. The new plan is to separate these two things more: the tags that we search for to build the catalog will be stuff that was in there already, and the translation will get done in some more centralized, lower level place.

This program (‘make-msgfile.c’) addresses the first part, extracting the strings.

For the emacs C code, we need to recognize the following patterns:

 
  message ("string" ... )
  error ("string")
  report_file_error ("string" ... )
  signal_simple_error ("string" ... )
  signal_simple_error_2 ("string" ... )
  
  build_translated_string ("string")
  #### add this and use it instead of build_cistring() in some places.
  
  yes_or_no_p ("string" ... )
  #### add this instead of funcalling Qyes_or_no_p directly.

  barf_or_query_if_file_exists	#### restructure this
  check all callers of Fsignal	#### restructure these
  signal_error (Qerror ... )		#### change all of these to error()
  
  And we also parse out the interactive prompts from DEFUN() forms.
  
  #### When we've got a string which is a candidate for translation, we
  should ignore it if it contains only format directives, that is, if
  there are no alphabetic characters in it that are not a part of a `%'
  directive.  (Careful not to translate either "%s%s" or "%s: ".)

For the emacs Lisp code, we need to recognize the following patterns:

 
  (message "string" ... )
  (error "string" ... )
  (format "string" ... )
  (read-from-minibuffer "string" ... )
  (read-shell-command "string" ... )
  (y-or-n-p "string" ... )
  (yes-or-no-p "string" ... )
  (read-file-name "string" ... )
  (temp-minibuffer-message "string")
  (query-replace-read-args "string" ... )

I expect there will be a lot like the above; basically, any function which is a commonly used wrapper around an eventual call to message or read-from-minibuffer needs to be recognized by this program.

 
  (dgettext "domain-name" "string")		#### do we still need this?
  
  things that should probably be restructured:
    princ in ‘cmdloop.elinsert in ‘debug.el’
    face-interactive
    ‘help.el’, ‘syntax.el’ all messed up

Author: Ben Wing

ben: (format) is a tricky case. If I use format to create a string that I then send to a file, I probably don’t want the string translated. On the other hand, If the string gets used as an argument to (y-or-n-p) or some such function, I do want it translated, and it needs to be translated before the %s and such are replaced. The proper solution here is for (format) and other functions that call gettext but don’t immediately output the string to the user to add the translated (and formatted) string as a string property of the object, and have functions that output potentially translated strings look for a "translated string" property. Of course, this will fail if someone does something like

 
   (y-or-n-p (concat (if you-p "Do you " "Does he ")
     		(format "want to delete %s? " filename))))

But you shouldn’t be doing things like this anyway.

ben: Also, to avoid excessive translating, strings should be marked as translated once they get translated, and further calls to gettext don’t do any more translating. Otherwise, a call like

 
   (y-or-n-p (format "Delete %s? " filename))

would cause translation on both the pre-formatted and post-formatted strings, which could lead to weird results in some cases (y-or-n-p has to translate its argument because someone could pass a string to it directly). Note that the "translating too much" solution outlined below could be implemented by just marking all strings that don’t come from a .el or .elc file as already translated.

Menu descriptors: one way to extract the strings in menu labels would be to teach this program about "^(defvar .*menu\n" forms; that’s probably kind of hard, though, so perhaps a better approach would be to make this program recognize lines of the form

 
  "string" ... ;###translate

where the magic token ";###translate" on a line means that the string constant on this line should go into the message catalog. This is analogous to the magic ";###autoload" comments, and to the magic comments used in the EPSF structuring conventions.

—– So this program manages to build up a catalog of strings to be translated. To address the second part of the problem, of actually looking up the translations, there are hooks in a small number of low level places in emacs.

Assume the existence of a C function gettext(str) which returns the translation of str if there is one, otherwise returns str.

What should we do about this? We could hack query-replace-read-args to translate its args, but might this be a more general problem? I don’t think we ought to translate all calls to format. We could just change the calling sequence, since this is odd in that the first %s wants to be translated but the second doesn’t.

Solving the "translating too much" problem:

The concern has been raised that in this situation:

then we would display the translation of Help, which would not be correct. We can solve this by adding a bit to Lisp_String objects which identifies them as having been read as literal constants from a .el or .elc file (as opposed to having been constructed at run time as it would in the above case.) To solve this:

More specifically, we do:

Scan specified C and Lisp files, extracting the following messages:

 
   C files:
      GETTEXT (...)
      DEFER_GETTEXT (...)
      DEFUN interactive prompts
   Lisp files:
      (gettext ...)
      (dgettext "domain-name" ...)
      (defer-gettext ...)
      (interactive ...)

The arguments given to this program are all the C and Lisp source files of GNU Emacs. .el and .c files are allowed. There is no support for .elc files at this time, but they may be specified; the corresponding .el file will be used. Similarly, .o files can also be specified, and the corresponding .c file will be used. This helps the makefile pass the correct list of files.

The results, which go to standard output or to a file specified with -a or -o (-a to append, -o to start from nothing), are quoted strings wrapped in gettext(...). The results can be passed to xgettext to produce a .po message file.

However, we also need to do the following:

  1. Definition of Arg below won’t handle a generalized argument as might appear in a function call. This is fine for DEFUN and friends, because only simple arguments appear there; but it might run into problems if Arg is used for other sorts of functions.
  2. snarf() should be modified so that it doesn’t output null strings and non-textual strings (see the comment at the top of ‘make-msgfile.c’).
  3. parsing of (insert) should snarf all of the arguments.
  4. need to add set-keymap-prompt and deal with gettext of that.
  5. parsing of arguments should snarf all strings anywhere within the arguments, rather than just looking for a string as the argument. This allows if statements as arguments to get parsed.
  6. begin_paren_counting() et al. should handle recursive entry.
  7. handle set-window-buffer and other such functions that take a buffer as the other-than-first argument.
  8. there is a fair amount of work to be done on the C code. Look through the code for #### comments associated with ’#ifdef I18N3’ or with an I18N3 nearby.
  9. Deal with get-buffer-process et al.
  10. Many of the changes in the Lisp code marked ’rewritten for I18N3 snarfing’ should be undone once (5) is implemented.
  11. Go through the Lisp code in prim and make sure that all strings are gettexted as necessary. This may reveal more things to implement.
  12. Do the equivalent of (8) for the Lisp code.
  13. Deal with parsing of menu specifications.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.17 Future Work – Lisp Stream API

Author: Ben Wing

Expose XEmacs internal lstreams to Lisp as stream objects. (In addition to the functions given below, each stream object has properties that can be associated with it using the standard put, get etc. API. For GNU Emacs, where put and get have not been extended to be general property functions, but work only on strings, we would have to create functions set-stream-property, stream-property, remove-stream-property, and stream-properties. These provide the same functionality as the generic get, put, remprop, and object-plist functions under XEmacs)

(Implement properties using a hash table, and generalize this so that it is extremely easy to add a property interface onto any kind of object)

 
(write-stream STREAM STRING)

Write the STRING to the STREAM. This will signal an error if all the bytes cannot be written.

 
(read-stream STREAM &optional N SEQUENCE)

Reads data from STREAM. N specifies the number of bytes or characters, depending on the stream. SEQUENCE specifies where to write the data into. If N is not specified, data is read until end of file. If SEQUENCE is not specified, the data is returned as a stream. If SEQUENCE is specified, the SEQUENCE must be large enough to hold the data.

 
(push-stream-marker STREAM)

returns ID, probably a stream marker object

 
(pop-stream-marker STREAM)

backs up stream to last marker

 
(unread-stream STREAM STRING)

The only valid STREAM is an input stream in which case the data in STRING is pushed back and will be read ahead of all other data. In general, there is no limit to the amount of data that can be unread or the number of times that unread-stream can be called before another read.

 
(stream-available-chars STREAM)

This returns the number of characters (or bytes) that can definitely be read from the screen without an error. This can be useful, for example, when dealing with non-blocking streams when an attempt to read too much data will result in a blocking error.

 
(stream-seekable-p STREAM)

Returns true if the stream is seekable. If false, operations such as seek-stream and stream-position will signal an error. However, the functions set-stream-marker and seek-stream-marker will still succeed for an input stream.

 
(stream-position STREAM)

If STREAM is a seekable stream, returns a position which can be passed to seek-stream.

 
(seek-stream STREAM N)

If STREAM is a seekable stream, move to the position indicated by N, otherwise signal an error.

 
(set-stream-marker STREAM)

If STREAM is an input stream, create a marker at the current position, which can later be moved back to. The stream does not need to be a seekable stream. In this case, all successive data will be buffered to simulate the effect of a seekable stream. Therefore use this function with care.

 
(seek-stream-marker STREAM marker)

Move the stream back to the position that was stored in the marker object. (this is generally an opaque object of type stream-marker).

 
(delete-stream-marker MARKER)

Destroy the stream marker and if the stream is a non-seekable stream and there are no other stream markers pointing to an earlier position, frees up some buffering information.

 
(delete-stream STREAM N)
 
(delete-stream-marker STREAM ID)
 
(close-stream stream)

Writes any remaining data to the stream and closes it and the object to which it’s attached. This also happens automatically when the stream is garbage collected.

 
(getchar-stream STREAM)

Return a single character from the stream. (This may be a single byte depending on the nature of the stream). This is actually a macro with an extremely efficient implementation (as efficient as you can get in Emacs Lisp), so that this can be used without fear in a loop. The implementation works by reading a large amount of data into a vector and then simply using the function AREF to read characters one by one from the vector. Because AREF is one of the primitives handled specially by the byte interpreter, this will be very efficient. The actual implementation may in fact use the function call-with-condition-handler to avoid the necessity of checking for overflow. Its typical implementation is to fetch the vector containing the characters as a stream property, as well as the index into that vector. Then it retrieves the character and increments the value and stores it back in the stream. As a first implementation, we check to see when we are reading the character whether the character would be out of range. If so, we read another 4096 characters, storing them into the same vector, setting the index back to the beginning, and then proceeding with the rest of the getchar algorithm.

 
(putchar-stream STREAM CHAR)

This is similar to getchar-stream but it writes data instead of reading data.

 
Function make-stream

There are actually two stream-creation functions, which are:

 
(make-input-stream TYPE PROPERTIES)
(make-output-stream TYPE PROPERTIES)

These can be used to create a stream that reads data, or writes data, respectively. PROPERTIES is a property list and the allowable properties in it are defined by the type. Possible types are:

  1. file (this reads data from a file or writes to a file)

    Allowable properties are:

    :file-name

    (the name of the file)

    :create

    (for output streams only, creates the file if it doesn’t already exist)

    :exclusive

    (for output streams only, fails if the file already exists)

    :append

    (for output streams only; starts appending to the end of the file rather than overwriting the file)

    :offset

    (positions in bytes in the file where reading or writing should begin. If unspecified, defaults to the beginning of the file or to the end of the file when :appended specified)

    :count

    (for input streams only, the number of bytes to read from the file before signaling "end of file". If nil or omitted, the number of bytes is unlimited)

    :non-blocking

    (if true, reads or writes will fail if the operation would block. This only makes sense for non-regular files).

  2. process (For output streams only, send data to a process.)

    Allowable properties are:

    :process

    (the process object)

  3. buffer (Read from or write to a buffer.)

    Allowable properties are:

    :buffer

    (the name of the buffer or the buffer object.)

    :start

    (the position to start reading from or writing to. If nil, use the buffer point. If true, use the buffer’s point and move point beyond the end of the data read or written.)

    :end

    (only for input streams, the position to stop reading at. If nil, continue to the end of the buffer.)

    :ignore-accessible

    (if true, the default for :start and :end ignore any narrowing of the buffer.)

  4. stream (read from or write to a lisp stream)

    Allowable properties are:

    :stream

    (the stream object)

    :offset

    (the position to begin to be reading from or writing to)

    :length

    (For input streams only, the amount of data to read, defaulting to the rest of the data in the string. Revise string for output streams only if true, the stream is resized as necessary to accommodate data written off the end, otherwise the writes will fail.

  5. memory (For output only, writes data to an internal memory buffer. This is more lightweight than using a Lisp buffer. The function memory-stream-string can be used to convert the memory into a string.)
  6. debugging (For output streams only, write data to the debugging output.)
  7. stream-device (During non-interactive invocations only, Read from or write to the initial stream terminal device.)
  8. function (For output streams only, send data by calling a function, exactly as with the STREAM argument to the print primitive.)

    Allowable Properties are:

    :function

    (the function to call. The function is called with one argument, the stream.)

  9. marker (Write data to the location pointed to by a marker and move the marker past the data.)

    Allowable properties are:

    :marker

    (the marker object.)

  10. decoding (As an input stream, reads data from another stream and decodes it according to a coding system. As an output stream decodes the data written to it according to a coding system and then writes results in another stream.)

    Properties are:

    :coding-system

    (the symbol of coding system object, which defines the decoding.)

    :stream

    (the stream on the other end.)

  11. encoding (As an input stream, reads data from another stream and encodes it according to a coding system. As an output stream encodes the data written to it according to a coding system and then writes results in another stream.)

    Properties are:

    :coding-system

    (the symbol of coding system object, which defines the encoding.)

    :stream

    (the stream on the other end.)

Consider

 
(define-stream-type 'type
  :read-function
  :write-function
  :rewind-
  :seek-
  :tell-
  (?:buffer)

Old Notes:

Expose lstreams as hash (put get etc. properties) table.

 
  (write-stream stream string)
  (read-stream stream &optional n sequence)
  (make-stream ...)
  (push-stream-marker stream)
     returns ID prob a stream marker object
  (pop-stream-marker stream)
     backs up stream to last marker
  (unread-stream stream string)
  (stream-available-chars stream)
  (seek-stream stream n)
  (delete-stream stream n)
  (delete-stream-marker stream ic) can always be poe only nested if you
    have set stream marker
  
  (get-char-stream generalizes stream)
  
  a macro that tries to be efficient perhaps by reading the next
  e.g. 512 characters into a vector and arefing them.  Might check aref
  optimization for vectors in the byte interpreter.
  
  (make-stream 'process :process ... :type write)
  
  Consider
  
  (define-stream-type 'type
    :read-function
    :write-function
    :rewind-
    :seek-
    :tell-
    (?:buffer)

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.18 Future Work – Multiple Values

Author: Ben Wing

On low level, all funs that can return multiple values are defined with DEFUN_MULTIPLE_VALUES and have an extra parameter, a struct mv_context *.

It has to be this way to ensure that only the fun itself, and no called funs, think they’re called in an mv context.

apply, funcall, eval might propagate their mv context to their children?

Might need eval-mv to implement calling a fun in an mv context. Maybe also funcall_mv? apply_mv?

Generally, just set up context appropriately. Call fun (noticing whether it’s an mv-aware fun) and binding values on the way back or passing them out. (e.g. to multiple-value-bind)

Common Lisp multiple values, required for specifier improvements.

The multiple return values from get-specifier should allow the specifier value to be modified in the correct fashion (i.e. should interact correctly with all manner of changes from other callers) using set-specifier. We should check this and see if we need other return values. (how-to-add? inst-list?)

In C, call multiple-values-context to get number of expected values, and multiple-value-set (#, value) to get values other than the first.

(Returns Qno_value, or something, if there are no values.

#### Or should throw? Probably not. #### What happens if a fn returns no values but the caller expects a #### value?

Something like funcall_with_multiple_values() for setting up the context.

For efficiency, byte code could notice Ffuncall to m.v. functions and sub in special opcodes during load in processing, if it mattered.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.19 Future Work – Macros

Author: Ben Wing

  1. Option to control whether beep really kills a macro execution.
  2. Recently defined macros are remembered on a stack, so accidentally defining another one doesn’t fuck you up. You can "rotate" anonymous macros or just pick one (numbered) to put on tags, so it works with execute macro - menu shows the anonymous macro, and lists some keystrokes. Normally numbered but you can easily assign to named fun or to keyboard sequence or give it a number (or give it a letter accelerator?)

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.20 Future Work – Specifiers

Author: Ben Wing

Ideas To Work On When Their Time Has Come

NOTE: Can do preliminary implementation without Multiple Values - instead create fun specifier-instance - that returns a list (and will be deleted at some point)

specifier &c changes for glyphs

  1. You can put get, etc. on vectors to modify properties within them.
  2. copy-over routines routines that carefully copy one complex item OVER another one, destroying the second in the process. I wrote one for lists. Need a general copy-over-tree.
  3. improvement to specifier mapping routines e.g.

    map-modifying-instantiator and its force versions below, so that we could implement in turns.

  4. put-specifier-property (specifier which finds the key, value instantiator in the locale, &opt locale possibly creating one tag-set) if necessary and goes into the vector, changes it, and puts it back into the specifier.
  5. Smarter add-spec-to-specifier

    If it notices that it’s just replacing one instantiator with another, instead of just copy-tree the first one and throw away the other, use copy-over-tree to save lots of garbage when repeatedly called.

    ILLEGIBLE: GOTO LOO BUI BUGS LAST PNOTE

  6. When at image instantiate:
  7. Reference counting. Somehow or other, each image instance in the cache needs to keep track of the instantiators that generated it.

It might do this through some sort of special instantiator-reference object. This points to the instantiator, where in the hierarchy the instantiator is etc. When an instantiator gets removed, this gu*ILLEGIBLE* values report not attached. Somehow that gets communicated back to the image instance in the cache. So somehow or other, the image instance in the cache knows who’s using them and so when you go and keep updating the slider value, by simply modifying an instantiator, which efficiently changes the internal structure of this specifier - eventually image instantiate notices that the image instance it points has no other user and just modifiers it, but in complex situations, some optimizations get lost, but everything is still correct.

vs.

Andy’s set-image-instance-property, which achieves the same optimizations much more easily, but

  1. falls apart in any more complicated system
  2. only works because of the way the caching system in XEmacs works. Any change (e.g. ILLEGIBLE more of making the caches GQ instead of GQ) is likely to make things stop working right in all but the simplest situation.

Specifier improvements for support of specifier inheritance (necessary for the new font mapping API)

’Fallback should be a locale/domain.

 
(get-specifier specifier &optional locale)

#### If locale is omitted, should it be (current-buffer) or 'global?
#### Should argument not be optional?

If a buffer is specified: find a window showing buffer by looking

If none, use buffer -> sel from -> etc.

 
Returns multiple values
  second is instantiator
  third  is locale containing inst.
  fourth is tag set

(restart-specifier-instance ...)

like specifier-instance, but allows restarting the lookup, for implementing inheritance, etc. Obsoletes specifier-matching-find-charset, or whatever it is. The restart argument is opaque, and is returned as a multiple value of restart-specifier-instance. (It’s actually an integer with the low bits holding the locale and the other bits count int to the list) attached to the locale.)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.21 Future Work – Display Tables

Author: Ben Wing

#### It would also be really nice if you could specify that the characters come out in hex instead of in octal. Mule does that by adding a ctl-hexa variable similar to ctl-arrow, but that’s bogus – we need a more general solution. I think you need to extend the concept of display tables into a more general conversion mechanism. Ideally you could specify a Lisp function that converts characters, but this violates the Second Golden Rule and besides would make things way way way way slow.

So instead, we extend the display-table concept, which was historically limited to 256-byte vectors, to one of the following:

  1. A 256-entry vector, for backward compatibility;
  2. char-table, mapping characters to values;
  3. range-table, mapping ranges of characters to values;
  4. a list of the above.

The fourth option allows you to specify multiple display tables instead of just one. Each display table can specify conversions for some characters and leave others unchanged. The way the character gets displayed is determined by the first display table with a binding for that character. This way, you could call a function enable-hex-display that adds a hex display-table to the list of display tables for the current buffer.

#### ...not yet implemented... Also, we extend the concept of "mapping" to include a printf-like spec. Thus you can make all extended characters show up as hex with a display table like this:

 
    #s(range-table data ((256 524288) (format "%x")))

Since more than one display table is possible, you have great flexibility in mapping ranges of characters.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.22 Future Work – Making Elisp Function Calls Faster

Author: Ben Wing

Abstract: This page describes many optimizations that can be made to the existing Elisp function call mechanism without too much effort. The most important optimizations can probably be implemented with only a day or two of work. I think it’s important to do this work regardless of whether we eventually decide to replace the Lisp engine.

Many complaints have been made about the speed of Elisp, and in particular about the slowness in executing function calls, and rightly so. If you look at the implementation of the funcall function, you’ll notice that it does an incredible amount of work. Now logically, it doesn’t need to be so. Let’s look first from the theoretical standpoint at what absolutely needs to be done to call a Lisp function.

First, let’s look at the situation that would exist if we were smart enough to have made lexical scoping be the default language policy. We know at compile time exactly which code can reference the variables that are the formal parameters for the function being called (specifically, only the code that is part of that function’s definition) and where these references are. As a result, we can simply push all the values of the variables onto a stack, and convert all the variable references in the function definition into stack references. Therefore, binding lexically-scoped parameters in preparation for a function call involves nothing more than pushing the values of the parameters onto a stack and then setting a new value for the frame pointer, at the same time remembering the old one. Because the byte-code interpreter has a stack-based architecture, however, the parameter values have already been pushed onto the stack at the time of the function call invocation. Therefore, binding the variables involves doing nothing at all, other than dealing with the frame pointer.

With dynamic scoping, the situation is somewhat more complicated. Because the parameters can be referenced anywhere, and these references cannot be located at compile time, their values have to be stored into a global table that maps the name of the parameter to its current value. In Elisp, this table is called the obarray. Variable binding in Elisp is done using the C function specbind(). (This stands for "special variable binding" where special is the standard Lisp terminology for a dynamically-scoped variable.) What specbind() does, essentially, is retrieve the old value of the variable out of the obarray, remember the value by pushing it, along with the name of the variable, onto what’s called the specpdl stack, and then store the new value into the obarray. The term "specpdl" means Special Variable Pushdown List, where Pushdown List is an archaic computer science term for a stack that used to be popular at MIT. These binding operations, however, should still not take very much time because of the use of symbols, i.e. because the location in the obarray where the variable’s value is stored has already been determined (specifically, it was determined at the time that the byte code was loaded and the symbol created), so no expensive hash table lookups need to be performed.

An actual function invocation in Elisp does a great deal more work, however, than was just outlined above. Let’s just take a look at what happens when one byte-compiled function invokes another byte-compiled function, checking for places where unnecessary work is being done and determining how to optimize these places.

  1. The byte-compiled function’s parameter list is stored in exactly the format that the programmer entered it in, which is to say as a Lisp list, complete with &amp;optional and &amp;rest keywords. This list has to be parsed for every function invocation, which means that for every element in a list, the element is checked to see whether it’s the &amp;optional or &amp;rest keywords, its surrounding cons cell is checked to make sure that it is indeed a cons cell, the QUIT macro is called, etc. What should be happening here is that the argument list is parsed exactly once, at the time that the byte code is loaded, and converted into a C array. The C array should be stored as part of the byte-code object. The C array should also contain, in addition to the symbols themselves, the number of required and optional arguments. At function call time, the C array can be very quickly retrieved and processed.
  2. For every variable that is to be bound, the specbind() function is called. This actually does quite a lot of things, including:
    1. Checking the symbol argument to the function to make sure it’s actually a symbol.
    2. Checking for specpdl stack overflow, and increasing its size as necessary.
    3. Calling symbol_value_buffer_local_info() to retrieve buffer local information for the symbol, and then processing the return value from this function in a series of if statements.
    4. Actually storing the old value onto the specpdl stack.
    5. Calling Fset() to change the variable’s value.

The entire series of calls to specbind() should be inline and merged into the argument processing code as a single tight loop, with no function calls in the vast majority of cases. The specbind() logic should be streamlined as follows:

  1. The symbol argument type checking is unnecessary.
  2. The check for the specpdl stack overflow needs to be done only once, not once per argument.
  3. All of the remaining logic should be boiled down as follows:
    1. Retrieve the old value from the symbol’s value cell.
    2. If this value is a symbol-value-magic object, then call the real specbind() to do the work.
    3. Otherwise, we know that nothing complicated needs to be done, so we simply push the symbol and its value onto the specpdl stack, and then replace the value in the symbol’s value cell.
    4. The only logic that we are omitting is the code in Fset() that checks to make sure a constant isn’t being set. These checks should be made at the time that the byte code for the function is loaded and the C array of parameters to the function is created. (Whether a symbol is constant or not is generally known at XEmacs compile time. The only issue here is with symbols whose names begin with a colon. These symbols should simply be disallowed completely as parameter names.)

Other optimizations that could be done are:


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.23 Future Work – Lisp Engine Replacement


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.23.1 Future Work – Lisp Engine Discussion

Author: Ben Wing

Abstract: Recently there has been a great deal of talk on the XEmacs mailing lists about potential changes to the XEmacs Lisp engine. Usually the discussion has centered around the question which is better, Common Lisp or Scheme? This is certainly an interesting debate topic, but it didn’t seem to have much practical relevance to me, so I vowed to stay out of the discussion. Recently, however, it seems that people are losing sight of the broader picture. For example, nobody seems to be asking the question, “"Would an extension language other than Lisp or Scheme (perhaps not a Lisp variant at all) be more appropriate?"” Nor does anybody seem to be addressing what I consider to be the most fundamental question, is changing the extension language a good thing to do?

I think it would be a mistake at this point in XEmacs development to begin any project involving fundamental changes to the Lisp engine or to the XEmacs Lisp language itself. It would take a huge amount of effort to complete even part of this project, and would be a major drain on the already-insufficient resources of the XEmacs development community. Most of the gains that are purported to stem from a project such as this could be obtained with far less effort by making more incremental changes to the XEmacs core. I think it would be an even bigger mistake to change the actual XEmacs extension language (as opposed to just changing the Lisp engine, making few, if any, externally visible changes). The only language change that I could possibly imagine justifying would involve switching to some ubiquitous web language, such as Java and JavaScript, or Perl. (Even among those, I think Java would be the only possibility that really makes sense).

In the rest of this document I’ll present the broader issues that would be involved in changing the Lisp engine or extension language. This should make clear why I’ve come to believe as I do.

Is everyone clear on the difference between interface and implementation?

There seems to be a great deal of confusion concerning the difference between interface and implementation. In the context of XEmacs, changing the interface means switching to a different extension language such as Common Lisp, Scheme, Java, etc. Changing the implementation means using a different Lisp engine. There is obviously some relation between these two issues, but there is no particular requirement that one be changed if the other is changed. It is quite possible, for example, to imagine taking the underlying engine for any of the various Lisp dialects in existence, and adapting it so that it implements the same Elisp extension language that currently exists. The vast majority of the purported benefits that we would get from changing the extension language could just as easily be obtained while making minimal changes to the external Elisp interface. This way nearly all existing Elisp programs would continue to work, there would be no need to translate Elisp programs into some other language or to simultaneously support two incompatible Lisp variants, and there would be no need for users or package authors to learn a new extension language that would be just as unfamiliar to the vast majority of them as Elisp is.

Why should we change the Lisp engine?

Let’s go over the possible reasons for changing the Lisp engine.

Speed.

Changing the Lisp engine might make XEmacs faster. However, consider the following.

  1. XEmacs will get faster over time without any development effort at all because computers will get faster.
  2. Perhaps the biggest causes of the slowness of XEmacs are not related to the Lisp engine at all. It has been asserted, for example, that the slowness of XEmacs is primarily due to the redisplay mechanism, to the handling of insertion and deletion of text in a buffer, to the event loop, etc. Nobody has done any real studies to determine what the actual cause of slowness is.
  3. Emacs 18 seems plenty fast enough to most people. However, Emacs 18 also had a worse Lisp engine and a worse byte compiler than XEmacs.
  4. Significant speed increases in the execution of Lisp code could be achieved without too much effort by working on the existing byte code interpreter and function call mechanism a bit.

Memory usage.

A new Lisp engine with a better garbage collection mechanism might make more efficient use of memory; for example, through the use of a relocating garbage collector. However, consider this:

  1. A new Lisp engine would probably have a larger memory footprint, perhaps a significantly larger one.
  2. The worst memory problems might not be due to Lisp object inefficiency at all. The problems could simply be due mainly to the inefficient buffer representation. Nobody has come up with any concrete numbers on where the real problem lies.

Robustness.

A new Lisp engine might well be more robust. (On the other hand, it might not be. It is not always easy to tell). However, I think that the biggest problems with robustness are in the part of the C code that is not concerned with implementing the Lisp engine. The redisplay mechanism and the unexec mechanism are probably the biggest sources of robustness problems. I think the biggest robustness problems that are related to the Lisp engine concern the use of GCPRO declarations. The entire GCPRO mechanism is ill-conceived and unsafe. The only real way to make this safe would be to do conservative garbage collection over the C stack and to eliminate the GCPRO declarations entirely. But how many of the Lisp engines that are being considered have such a mechanism built into them?

Maintainability.

A new Lisp engine might well improve the maintainability of XEmacs by offloading the maintenance of the Lisp engine. However, we need to make very sure that this is, in fact, the case before embarking on a project like this. We would almost certainly have to make significant modifications to any Lisp engine that we choose to integrate, and without the active and committed support and cooperation of the developers of that Lisp engine, the maintainability problem would actually get worse.

Features.

A new Lisp engine might have built in support for various features that we would like to add to the XEmacs extension language, such as lexical scoping and an object system.

Why would we want to change the extension language?

Possible reasons for changing the extension language include:

More standard.

Switching to a language that is more standard and more commonly in use would be beneficial for various reasons. First of all, the language that is more commonly used and more familiar would make it easier for users to write their own extensions and in general, increase the acceptance of XEmacs. Also, an accepted standard probably has had a lot more thought put into it than any language interface created by the XEmacs developers themselves. Furthermore, if our extension language is being actively developed and supported, much of the work that we would otherwise have to do ourselves is transferred elsewhere.

However, both Scheme and Common Lisp flunk the familiarity test. Neither language is being actively used for program development outside of small research communities, and few prospective authors of XEmacs extensions will be familiar with any Lisp variant for real world uses. (I consider the argument that Scheme is often used in introductory programming courses to be irrelevant. Many existing programmers were taught Pascal in their introductory programming courses. How many of them would actually be comfortable writing a program in Pascal?) Furthermore, someone who wants to learn Lisp can’t exactly go to their neighborhood bookstore and pick up a book on this topic.

Ease of use.

There are endless arguments about which language is easiest to use. In practice, this largely boils down to which languages are most familiar.

Object oriented.

The object-oriented paradigm is the dominant one in use today for new languages. User interface concepts in particular are expressed very naturally in an object-oriented system. However, neither Scheme nor Common Lisp has been designed with object orientation in mind. There is a standard object system for Common Lisp, but it is extremely complex and difficult to understand.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.23.2 Future Work – Lisp Engine Replacement – Implementation

Author: Ben Wing

Let’s take a look at the sort of work that would be required if we were to replace the existing Elisp engine in XEmacs with some other engine, for example, the Clisp engine. I’m assuming here, of course, that we are not going to be changing the interface here at the same time, which is to say that we will be keeping the same Elisp language that we currently have as the extension language for XEmacs, except perhaps for incremental changes that we will make, such as lexical scoping and proper structure support in an attempt to gradually move the language towards an upwardly-compatible goal, such as Common Lisp. I am writing this page primarily as food for thought. I feel fairly strongly that actually doing this work would be a big waste of effort that would inevitably become a huge time sink on the part of nearly everyone involved in XEmacs development, and not only for the ones who were supposed to be actually doing the engine change. I feel that most of the desired changes that we want for the language and/or the engine can be achieved with much less effort and time through incremental changes to the existing code base.

First of all, in order to make a successful Lisp engine change in XEmacs, it is vitally important that the work be done through a series of incremental stages where at the end of each stage XEmacs can be compiled and run, and it works. It is tempting to try to make the change all at once, but this would be disastrous. If the resulting product worked at all, it would inevitably contain a huge number of subtle and extremely difficult to track down bugs, and it would be next to impossible to determine which of the myriad changes made introduced the bug.

Now let’s look at what the possible stages of implementation could be.

An Extra C Preprocessing Stage

The first step would be to introduce another preprocessing stage for the XEmacs C code, which is done before the C compiler itself is invoked on the code, and before the standard C preprocessor runs. The C preprocessor is simply not powerful enough to do many of the things we would like to do in the C code. The existing results of this have been a combination of a lot of hacked up and tricky-to-maintain stuff (such as the DEFUN macro, and the associated DEFSUBR), as well as code constructs that are difficult to write. (Consider for example, attempting to do structured exception handling, such as catch/throw and unwind-protect constructs), as well as code that is potentially or actually unsafe (such as the uses of alloca), which could easily cause stack overflow with large amounts of memory allocated in this fashion.) The problem is that the C preprocessor does not allow macros to have the power of an actual language, such as C or Lisp. What our own preprocessor should do is allow us to define macros, whose definitions are simply functions written in some language which are executed at compile time, and whose arguments are the actual argument for the macro call, as well as an environment which should have a data structure representation of the C code in the file and allow this environment to be queried and modified. It can be debated what the language should be that these extensions are written in. Whatever the language chosen, it needs to be a very standard language and a language whose compiler or interpreter is available on all of the platforms that we could ever possibly consider putting XEmacs to, which is basically to say all the platforms in existence. One obvious choice is C, because there will obviously be a C compiler available, because it is needed to compile XEmacs itself. Another possibility is Perl, which is already installed on most systems, and is universally available on all others. This language has powerful text processing facilities which would probably make it possible to implement the macro definitions more quickly and easily; however, this might also encourage bad coding practices in the macros (often simple text processing is not appropriate, and more sophisticated parsing or recursive data structure processing needs to be done instead), and we’d have to make sure that the nested data structure that comprises the environment could be represented well in Perl. Elisp would not be a good choice because it would create a bootstrapping problem. Other possible languages, such as Python, are not appropriate, because most programmers are unfamiliar with this language (creating a maintainability problem) and the Python interpreter would have to be included and compiled as part of the XEmacs compilation process (another maintainability problem). Java is still too much in flux to be considered at this point.

The macro facility that we will provide needs to add two features to the language: the ability to define a macro, and the ability to call a macro. One good way of doing this would be to make use of special characters that have no meaning in the C language (or in C++ for that matter), and thus can never appear in a C file outside of comments and strings. Two obvious characters are the @ sign and the $ sign. We could, for example, use @ defined to define new macros, and the $ sign followed by the macro name to call a macro. (Proponents of Perl will note that both of these characters have a meaning in Perl. This should not be a problem, however, because the way that macros are defined and called inside of another macro should not be through the use of any special characters which would in effect be extending the macro language, but through function calls made in the normal way for the language.)

The program that actually implements this extra preprocessing stage needs to know a certain amount about how to parse C code. In particular, it needs to know how to recognize comments, strings, character constants, and perhaps certain other kinds of C tokens, and needs to be able to parse C code down to the statement level. (This is to say it needs to be able to parse function definitions and to separate out the statements, if blocks, while blocks, etc. within these definitions. It probably doesn’t, however need to parse the contents of a C expression.) The preprocessing program should work first by parsing the entire file into a data structure (which may just contain expressions in the form of literal strings rather than a data structure representing the parsed expression). This data structure should become the environment parameter that is passed as an argument to macros as mentioned above. The implementation of the parsing could and probably should be done using lex and yacc. One good idea is simply to steal some of the lex and yacc code that is part of GCC.

Here are some possibilities that could be implemented as part of the preprocessing:

  1. A proper way of doing the DEFUN macros. These could, for example, take an argument list in the form of a Lisp argument list (complete with keyword parameters and other complex features) and automatically generate the appropriate subr structure, the appropriate C function definition header, and the appropriate call to the DEFSUBR initialization function.
  2. A truly safe and easy to use implementation of the alloca function. This could allocate the memory in any fashion it chooses (calling malloc using a large global array, or a series of such arrays, etc.) an insert in the appropriate places to automatically free up this memory. (Appropriate places here would be at the end of the function and before any return statements. Non-local exits can be handled in the function that actually implements the non-local exit.)
  3. If we allow for the possibility of having an arbitrary Lisp engine, we can’t necessarily assume that we can call Lisp primitives implemented in C from other C functions by simply making a function all. Perhaps something special needs to happen when this is done. This could be handled fairly easily by having our new and improved DEFUN macro define a new macro for use when calling a primitive.

Make the Existing Lisp Engine be Self-contained.

The goal of this stage is to gradually build up a self-contained Lisp engine out of the existing XEmacs core, which has no dependencies on any of the code elsewhere in the XEmacs core, and has a well-defined and black box-style interface. (This is to say that the rest of the C code should not be able to access the implementation of the Lisp engine, and should make as few assumptions as possible about how this implementation works). The Lisp engine could, and probably should, be built up as a separate library which can be compiled on its own without any of the rest of the XEmacs C code, and can be tested in this configuration as well.

The creation of this engine library should be done as a series of subsets, each of which moves more code out of the XEmacs core and into the engine library, and XEmacs should be compilable and runnable between each sub-step. One possible series of sub-steps would be to first create an engine that does only object allocation and garbage collection, then as a second sub-step, move in the code that handles symbols, symbol values, and simple binding, and then finally move in the code that handles control structures, function calling, byte-code execution, exception handling, etc. (It might well be possible to further separate this last sub-step).

Removal of Assumptions About the Lisp Engine Implementation

Currently, the XEmacs C code makes all sorts of assumptions about the implementation of the Lisp engine, particularly in the areas of object allocation, object representation, and garbage collection. A different Lisp engine may well have different ways of doing these implementations, and thus the XEmacs C code must be rid of any assumptions about these implementations. This is a tough and tedious job, but it needs to be done. Here are some examples:

  1. GCPRO must go. The GCPRO mechanism is tedious, error-prone, unmaintainable, and fundamentally unsafe. As anyone who has worked on the C Core of XEmacs knows, figuring out where to insert the GCPRO calls is an exercise in black magic, and debugging crashes as a result of incorrect GCPROing is an absolute nightmare. Furthermore, the entire mechanism is fundamentally unsafe. Even if we were to use the extra preprocessing stage detailed above to automatically generate GCPRO and UNGCPRO calls for all Lisp object variables occurring anywhere in the C code, there are still places where we could be bitten. Consider, for example, code which calls cons and where the two arguments to this functions are both calls to the append function. Now the append function generates new Lisp objects, and it also calls QUIT, which could potentially execute arbitrary Lisp code and cause a garbage collection before returning control to the append function. Now in order to generate the arguments to the cons function, the append function is called twice in a row. When the first append call returns, new Lisp data has been created, but has no GCPRO pointers to it. If the second append call causes a garbage collection, the Lisp data from the first append call will be collected and recycled, which is likely to lead to obscure and impossible-to-debug crashes. The only way around this would be to rewrite all function calls whose parameters are Lisp objects in terms of temporary variables, so that no such function calls ever contain other function calls as arguments. This would not only be annoying to implement, even in a smart preprocessor, but would make the C code become incredibly slow because of all the constant updating of the GCPRO lists.
  2. The only proper solution here is to completely do away with the GCPRO mechanism and simply do conservative garbage collection over the C stack. There are already portable implementations of conservative pointer marking over the C stack, and these could easily be adapted for use in the Elisp garbage collector. If, as outlined above, we use an extra preprocessing stage to create a new version of alloca that allocates its memory elsewhere than actually on the C stack, and we ensure that we don’t declare any large arrays as local variables, but instead use alloca, then we can be guaranteed that the C stack is small and thus that the conservative pointer marking stage will be fast and not very likely to find false matches.
  3. Removing the GCPRO declarations as just outlined would also remove the assumption currently made that garbage collection can occur only in certain places in the C code, rather than in any arbitrary spot. (For example, any time an allocation of Lisp data happens). In order to make things really safe, however, we also have to remove another assumption as detailed in the following item.
  4. Lisp objects might be relocatable. Currently, the C code assumes that Lisp objects other than string data are not relocatable and therefore it’s safe to pass around and hold onto the actual pointers for the C structures that implement the Lisp objects. Current code, for example, assumes that a Lisp_Object of type buffer and a C pointer to a struct buffer mean basically the same thing, and indiscriminately passes the two kinds of buffer pointers around. With relocatable Lisp objects, the pointers to the C structures might change at any time. (Remember, we are now assuming that a garbage collection can happen at basically any point). All of the C code needs to be changed so that Lisp objects are always passed around using a Lisp object type, and the underlying pointers are only retrieved at the time when a particular data element out of the structure is needed. (As an aside, here’s another reason why Lisp objects, instead of pointers, should always be passed around. If pointers are passed around, it’s conceivable that at the time a garbage collection occurs, the only reference to a Lisp object (for example, a deleted buffer) would be in the form of a C pointer rather than a Lisp object. In such a case, the conservative pointer marking mechanism might not notice the reference, especially if, in an attempt to eliminate false matches and make the code generally more efficient, it will be written so that it will look for actual Lisp object references.)
  5. I would go a step farther and completely eliminate the macros that convert a Lisp object reference into a C pointer. This way the only way to access an element out of a Lisp object would be to use the macro for that element, which in one atomic operation de-references the Lisp object reference and retrieves the value contained in the element. We probably do need the ability to retrieve actual C pointers, though. For example, in the case where an array is stored in a Lisp object, or simply for efficiency purposes where we might want some code to retrieve the C pointer for a Lisp object, and work on that directly to avoid a whole bunch of extra indirections. I think the way to do this would be through the use of a special locking construct implemented as part of the extra preprocessor stage mentioned above. This would essentially be what you might call a lock block, just like a while block. You’d write the word lock followed by a parenthesized expression that retrieves the C pointer and stores it into a variable that is scoped only within the lock block and followed in turn by some code in braces, which is the actual code associated with the lock block, and which can make use of this pointer. While the code inside the lock block is executing, that particular pointer and the object pointed to by it is guaranteed not to be relocated.
  6. If all the XEmacs C code were converted according to these rules, there would be no restrictions on the sorts of implementations that can be used for the garbage collector. It would be possible, for example, to have an incremental asynchronous relocating garbage collector that operated continuously in another thread while XEmacs was running.
  7. The C implementation of Lisp objects might not, and probably should not, be visible to the rest of the XEmacs C code. It should theoretically be possible, for example, to implement Lisp objects entirely in terms of association lists, rather than using C structures in the standard way. (This may be an extreme example, but it’s good to keep in mind an example such as this when cleaning up the XEmacs C code). The changes mentioned in the previous item would go a long way towards removing this assumption. The only places where this assumption might still be made would be inside of the lock blocks where an actual pointer is retrieved. (Also, of course, we’d have to change the way that Lisp objects are defined in C so that this is done with some function calls and new and improved macros rather than by having the XEmacs C code actually define the structures. This sort of thing would probably have to be done in any case once the allocation mechanism is moved into a separate library.) With some thought it should be possible to define the lock block interface in such a way as to remove any assumptions about the implementation of Lisp objects.
  8. C code may not be able to call Lisp primitives that are defined in C simply by making standard C function calls. There might need to be some wrapper around all such calls. This could be achieved cleanly through the extra preprocessing step mentioned above, in line with the example described there.

Actually Replacing the Engine.

Once we’ve done all of the work mentioned in the previous steps (and admittedly, this is quite a lot of work), we should have an XEmacs that still uses what is essentially the old and previously existing Lisp engine, but which is ready to have its Lisp engine replaced. The replacement might proceed as follows:

  1. Identify any further changes that need to be made to the engine interface that we have defined as a result of the previous steps so that features and idiosyncrasies of various Lisp engines that we examine could be properly supported.
  2. Pick a Lisp engine and write an interface layer that sits on top of this Lisp engine and makes it adhere to what I’ll now call the XEmacs Lisp engine interface.
  3. Strongly consider creating, if we haven’t already done so, a test suite that can test the XEmacs Lisp engine interface when used with a stand-alone Lisp engine.
  4. Test the hell out of the Lisp engine that we’ve chosen when combined with its XEmacs Lisp engine interface layer as a stand-alone program.
  5. Now finally attach this stand-alone program to XEmacs itself. Debug and fix any further problems that ensue (and there inevitably will be such problems), updating the test suite as we go along so that if it were run again on the old and buggy interfaced Lisp engine, it would note the bug.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.23.3 Future Work – Startup File Modification by Packages

Author: Ben Wing

OK, we need to create a design document for all of this, including:

PRINCIPLE #1: Whenever you have auto-generated stuff, CLEARLY indicate this in comments around the stuff. These comments get searched for, and used to locate the existing generated stuff to replace. Custom currently doesn’t do this.

PRINCIPLE #2: Currently, lots of functions want to add code to the .emacs. (e.g. I get prompted for my mail address from add-change-log-entry, and then prompted if I want to make this permanent). There needs to be a Lisp API for working with arbitrary code to be added to a user’s startup. This API hides all the details of which file to put the fragment in, where in it, how to mark it with magical comments of the right kind so that previous fragments can be replaced, etc.

PRINCIPLE #3: ALL generated stuff should be loaded before any user-written init stuff. This way the user can override the generated settings. Although in the case of customize, it may work when the custom stuff is at the end of the init file, it surely won’t work for arbitrary code fragments (which typically do setq or the like).

PRINCIPLE #4: As much as possible, generated stuff should be place in separate files from non-generated stuff. Otherwise it’s inevitable that some corruption is going to result.

PRINCIPLE #5: Packages are encouraged, as much as possible, to work within the customize model and store all their customizations there. However, if they really need to have their own init files, these files should be placed in .xemacs/, given normal names (e.g. ‘saved-abbrevs.el’ not .abbrevs), and there should be some magic comment at the top of the file that causes it to get automatically loaded while loading a user’s init file. (Alternatively, the above-named API could specify a function that lets a package specify that they want such-and-such file loaded from the init file, and have the specifics of this get handled correctly.)

OVERARCHING GOAL: The overarching goal is to provide a unified mechanism for packages to store state and setting information about the user and what they were doing when XEmacs exited, so that the same or a similar environment can be automatically set up the next time. In general, we are working more and more towards being a truly GUI app where users’ settings are easy to change and get remembered correctly and consistently from one session to the next, rather than requiring nasty hacking in elisp.

Hrvoje, do you have any interest in this? How about you, Martin? This seems like it might be up your alley. This stuff has been ad-hocked since kingdom come, and it’s high time that we make this work properly so that it could be relied upon, and a lot of things could "just work".


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.24 Future Work – Better Rendering Support

This section was written by Stephen Turnbull <stephen@xemacs.org>, so don’t blame Ben (or Eric and Matthias, for that matter). Feel free to add, edit, and share the blame, guys!

As of late November 2004, this principally means adding support for the ‘Xft’ library, which provides a more robust font configuration mechanism via Keith Packard’s ‘fontconfig’ library improved glyph rendering, including antialiasing, via the ‘freetype’ library, and client-side rendering (saving bandwidth and server memory) via the ‘XRender extension’. In fact, patches which provide Xft support have been available for several years, but the authors have been unwilling to deal with several important issues which block integration. These are Mule, and more generally, face support; widget support (including the toolbar and menubar); and redisplay refactoring.

However, in late 2003 Eric Knauel <knauel@informatik.uni-tuebingen.de> and Matthias Neubauer <neubauer@informatik.uni-freiburg.de> put forward a relatively complete patch which was robust to daily use in ISO 8859-1 locales, and Stephen Turnbull began work on the integration issues. At this point a (private) CVS branch is available for Stephen’s patch (branch point tag ‘sjt-xft-bp’, branch tag ‘sjt-xft’), and one may be made available for the Knauel-Matthias patch soon.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.24.1 Better Rendering Support – Review Criteria

Of course it’s “unfair” to demand that the implementers of a nice feature like anti-aliasing support deal with accumulated cruft of the last few years, but somebody must, sometime soon. Even core developers are complaining about how slow XEmacs is in some applications, and there is reason to believe that some of the problem is in redisplay. Adding more ad hoc features to redisplay will make the whole module more complex and unintelligible. Even if it doesn’t inherently further detract from efficiency, it will surely make reform and refactoring harder.

Similar considerations apply to Mule support. If Xft support is not carefully designed, or implemented with Mule support soon, it will undoubtedly make later Mule implementation far more difficult than it needs to be, and require redundant work be done (e.g., on ‘Options’ menu support).

Besides the design issue—and many users are requesting more flexibility, primarily face support, from the widgets—with widget support there is also an aesthetic issue. It is horribly unimpressive to have clunky bitmapped fonts on the decorations when pleasant antialiased fonts are available in the buffer.

Finally, these issues interact. Widgets and faces are inherently heavyweight objects, requiring orders of magnitude more computation than simply displaying a string in a fixed font. This will have an efficiency impact, of course. And they interact with each other; Mule was designed for use in buffers and display in Emacs windows—but a widget’s content is usually not a buffer, and widgets need not be displayed in a window, but may appear in other contexts, especially in the gutters. So specifiers will probably have to be reworked, in order to properly support display of different faces in non-buffer, non-window contexts.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.24.2 Better Rendering Support – Implementation

Stephen is thinking in terms of the following components of a comprehensive proposal.

Font configuration

In XEmacs, font configuration is handled via faces. Currently XEmacs uses a special type of font specifier to map XEmacs locales to font names. Especially under X11, this can cause annoying problems because of the unreliability of X servers’ mappings from ‘XLFD’ names to X11 fonts, over which XEmacs has no influence whatsoever. However, the ‘fontconfig’ library which is used with ‘Xft’ provides much more reliable mapping, along with a more reliably parsable naming scheme similar to that used by TrueType fonts on MS Windows and the Macintosh. Since the capabilities of font specifiers and ‘fontconfig’ overlap, we should consider using ‘fontconfig’ instead of ‘XLFD’ names. This implies that use of ‘Xft’’s rendering functionality should be separated from use of ‘fontconfig’.

fontconfig

Fontconfig is dramatically different from the X model in several ways. In particular, for the convenient interface fontconfig always returns a font. However, the font returned need not be anything like the desired font. This means that XEmacs must adopt a strategy of delegating the search to fontconfig, then sanity-checking the return, rather than trying to use the fontconfig API to search using techniques appropriate for the X11 core font API. (This isn’t actually true. fontconfig has more complex interfaces which allow listing a subset of fonts that match a pattern, and don’t go out of their may to return something no matter what. But the original patches didn’t use this approach.)

Font menus

The ‘Options->Font’ and ‘Options->Font Sizes’ menus are broken, by design, not just by ‘Xft’. Although they work better in Eric and Matthias’s patch than in Stephen’s, even their version has the problem that many fonts are unavailable because they don’t match the current size—which is very strange, since ‘Xft’ fonts are of course scalable. But the whole idea of requiring that the font match the size is strange. And the ‘Options->Font Weights’ menu is just disabled, and has been for eons.

X resources

Currently in Stephen’s patch there are five treatments of font resources. There are the ‘XEmacs.face.attributeFont’ resources used to set a single global font specification. In the widgets, some (still) have a ‘font’ resource using the automatic ‘Xt’ resource conversion to ‘FontStruct’, some have separate ‘font’ and ‘fcFontName’ resources with the former automatically converted to ‘FontStruct’ by ‘Xt’ and the latter left as a string, to be converted by ‘FcParseName’ later, and some have a single ‘font’ resource which is converted to ‘FontStruct’ by ‘Xt’ or the latter left as a string, depending on whether ‘Xft’ was enabled by ‘configure’ or not. There is also the ‘xftFont’ resource which may be retargeted to use an Xt converter function, but currently simply just an alias for the ‘fcFontName’ resource.

Stephen thinks that all of these should be converted to use the face approach, perhaps with some way to set specifications for individual widgets, frames, or buffers. This will require some careful design work to incorporate face support in the widgets. We should just accept any or all of ‘font’, ‘fontSet’, and ‘fontList’ resources, treat them all as lists of font names, either ‘XLFD’- or ‘fontconfig’-style, parse them ourselves (ie, not use the ‘Xt’ resource manager), and add them to font specifiers as appropriate. But this will require a bit of thought to obey POLA vis-a-vis usual ‘Xt’ conventions.

Rendering engine objects

With the introduction of the “Xft patch,” the X11, Macintosh, and MS Windows platforms are all able to support multiple font rendering engines in the same binary. Generically, there are several tasks that must be accomplished to render text on the display. In both cases the code is rather disorganized, with substantial cross-platform duplication of similar routines. While it may not be worthwhile to go the whole way to ‘RENDERER_HAS_METHOD’ and ‘MAYBE_RENDMETH’, refactoring these modules around the notion of interfacing a “generic rendering engine interface” to “text” seems like a plausible way to focus this work.

Further evidence for this kind of approach is a bug recently fixed in the ‘xft-sjt’ branch. XEmacs was crashing because the Athena Label widget tried to access a nonexistent font in its initialization routine. The font didn’t exist because although no core X11 font corresponding to the spec existed, an Xft font was found. So the XEmacs font instance existed but it did not specify an X11 core font, only the Xft font. When this object was used to initialize the font for the Label widget, None (0) was passed to XtSetArgs, then XtCreateWidget was called, and the internal initialization routine attempted to access that (nonexistent) font while computing an X11 graphics context (GC).

A similar issue applies to colors, but there Xft colors keep the pixel data internally, so (serendipitously) the X11 color (i.e., pixel) member does get updated.

Colors, fonts, and faces

Besides the rendering engine itself, the XEmacs implementations of these objects are poorly supported by current widget implementations, including the traditional menubar and toolbar, as well as the more recent button, tab control, and progress bar widgets. The refactoring suggested under “Rendering engine objects” should be conducted with an eye to making these widgets support faces, perhaps even to the extent of allowing rendering to X pixmaps (which some Athena widgets support, although they will not support rendering via Xft directly). Especially with ‘XRender’ technology this should not be horribly inefficient.

Specifiers, charsets, and languages

Traditionally Mule uses a rather rigid and low-level abstraction, the charset, to characterize font repertoires. Unfortunately, support for a given charset is generally neither necessary nor sufficient to support a language. Worse, although X11’s only means for indicating font repertoires is the font’s registry, the actual repertoire of many fonts is either deficient or font-dependent. The only convenience is that the registry maps directly to a Mule charset in most cases, and vice versa.

To date, XEmacs Mule has supported identification of appropriate fonts to support a language’s repertoire of characters by identifying the repertoire as a subset of a union of charsets. To each charset there is a regular expression matching the registry portion of a font name. Then instantiation of a font proceeds by identifying the specifier domain, and then walking down the list of specifications, matching the regexp against font names until a match is found. That font is requested from the system, and if not found, the process continues similarly until a font that can be loaded is found.

This has several problems. First, there’s no guarantee that the union will be disjoint. This problem manifests both in the case of display of Unicode representations of text in the ‘POSIX’ default locale, where glyphs are typically drawn from several inappropriate fonts. A similar problem often occurs, though for a different reason, in multilingual messages composed using ‘Gnus’’s ‘message-mode’ and MIME support. This problem cannot be avoided with the current design; it is quite possible that a font desired in one context will be shadowed by a font intended to get higher priority in a semantically different but syntactically similar (as far as Mule can tell) context. (Of course, one could attach a different face as a text property, but that requires programming support; it can’t be done by user configuration.) The problem is only exacerbated as more and more Unicode fonts, supporting large repertoires with substantial overlap across fonts, are designed and published.

A second problem is that registry names are often inaccurate. For example, the Japanese JIS X 0208 standard was first published in 1978 (as a relabelling of an older standard). It was then revised in 1983, again in 1990, and once again in 2000, with slight changes to the repertoire and mapping in each revision. Technically, these standards can be distinguished in properly named fonts as ‘jisx0208.1978’, ‘jisx0208.1983’, ‘jisx0208.1990’, ‘jisx0208.2000’, but all of them are commonly simply labelled ‘jisx0208’, and Western distributors, of course, generally lack the expertise to correctly relabel them.

A third problem is that you generally can’t tell if there are “holes” in the repertoire until you try to display the glyph.

All of this tends to break standard idioms for handling Mule fonts in ‘init’ files because they depend on charsets being disjoint repertoires.

The TrueType fonts (and the later OpenType standard) provides for a proper character set query (as a Boolean vector indexed by Unicode code points), as well as providing a list of supported languages.

I propose that we take advantage of these latter facilities by allowing a font to be specified either as a string (a font name), or as a list whose head is the font name and whose tail is a list of languages and Mule charsets (for backward compatibility) that user intends to use the font to display. This will probably require a change to the specifier code.

As mentioned above, specifiers will probably also have to be enhanced to recognize ‘widget’ locales and domains, instead of the current hack where special ‘widget’ and ‘gui-element’ faces are created.

Customize

Customize needs to deal with all this stuff!!


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.24.3 Better Rendering Support – Current Status

Stephen has a branch containing his stuff in XEmacs CVS. The branch point tag is ‘sjt-xft-bp’, roughly corresponding to XEmacs 21.5.18, and branch tag is ‘sjt-xft’.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.24.3.1 Bugs Reported in sjt-xft

ChangeLogs

A lot of these, especially for Eric and Matthias’s work, are missing. Mea culpa.

Options->Font
Options->Font Size

These menus don’t work. All fonts are greyed out. All sizes are available, but many (most?) faces don’t change size, in particular, ‘default’ does not.

Antialiased text bleeding outside of reported extent

On my PowerBook G4 Titanium 15" screen, X.org server v6.8.1, dimensions: 1280x833 pixels (433x282 millimeters), resolution: 75x75 dots per inch, depth of root window: 24 planes (yes, those dimensions are broken), with font "Bitstream Vera Sans Mono-16:dpi=75" antialiased text may bleed out of the extent reported by XftTextExtents and other such facilities. This is most obvious with the underscore character in that font. The bottom of the underscore is antialiased, and insertions or deletions in the same line before the underscore leave a series of "phantom" underlines. Except that it doesn’t happen on the very first such insertion or deletion after a window refresh. A similar effect sometimes occurs with deletions at the end of the line (no, I can’t define "sometimes"). See also comments in ‘redisplay-x.c’, functions x_output_string and x_output_display_block. (Mostly duplicated here.)

I think this is probably an Xft bug, but I’m not sure.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.24.4 Better Rendering Support – Configuration with the Interim Patches

For Stephen’s ‘sjt-xft’ branch, you should keep the following in mind when configuring:


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.24.5 Better Rendering Support – Modern Font Support

NB: This subtree eventually needs to be moved to the Lispref.

This chapter describes integration of the ‘Xft’ font support library into XEmacs. This library is a layer over the separate ‘FreeType’ rendering engine and ‘fontconfig’ font query and selection libraries. ‘FreeType’ provides rendering facilities for modern, good-looking TrueType fonts with hinting and antialiasing, while ‘fontconfig’ provides a coherent interface to font query and selection which is independent of the rendering engine, although currently it is only used in ‘Xft’ to interface to ‘FreeType’.

From the user’s point of view, ‘fontconfig’ provides a naming convention which is precise, accurate, and convenient. Precision means that all properties available in the programming API can be individually specified. Accuracy means that the truename of the font is exactly the list of all properties specified by the font. Thus, the anomalies that occur with XLFDs on many servers (including modern Linux distributions with XFree86 or X.org servers) cannot occur. Convenience is subjective, of course. However, ‘fontconfig’ provides a configuration system which (1) explicitly specifies the defaults and substitutions that will be made in processing user queries, and (2) allows the user to specify search configuration, abbreviations, substitutions, and defaults that override the system’s, in the same format as used by system files. Further, a standard minimal configuration is defined that ensures that at least serif, sans-serif, and monospace fonts are available on all ‘fontconfig’ systems.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.24.5.1 Modern Font Support – Font Concepts

In modern systems, displays are invariably raster graphic devices, which present an abstract interface of pixel array where each pixel value is a color, and each pixel is individually mutable, and (usually) readable. In XEmacs, such devices are collectively called GUI devices, as opposed to TTY devices which are character stream devices but may support control sequences for setting the color of individual characters, and the insertion position in a rectangular array. Here we are concerned only with control of GUI devices but use TTY devices as a standard for comparison.

A font is an indexed collection of glyphs, which are specifications of character shapes. On a TTY device, these shapes are entirely abstract, and the index is the identity function. Typically fonts are embedded in TTY devices, the user has no control over the font from within the application, and where choice is available, there is limited selection, and no extensibility. Simple, functional, and ... ugly.

On GUI devices, the situation is different in every respect. Glyphs may be provided by the device, the application, or the user. Additional glyphs may be added at will at any of those levels. Arbitrary index functions allow the same glyph to be used to display characters in different languages or using application-specific codes. Glyphs have concrete APIs, allowing fine control of rendering parameters, even user-specified shapes. To provide convenient, consistent handling of collections of glyphs, we need a well-defined font API.

We can separate the necessary properties into two types: properties which are common to all glyphs in the collection or a property of the collection itself, and those which are glyph-specific. Henceforth, the former are called font properties and the latter glyph properties.

Font properties include identification like the font family, font-wide design parameters like slant and weight, font metrics like size (nominal height) and average width used for approximate layout (such as sizing a popup dialog), and properties like the default glyph that are associated with the font for convenient use by APIs, but aren’t really an intrinsic property of the font as a collection of glyphs. There may also be a kerning table (used to improve spacing of adjacent glyphs).

Glyph properties include the index, glyph metrics such as ascent, descent, width, offset (the offset to the normal position of the next glyph), italic correction (used to improve spacing when slanted and unslanted glyphs are juxtaposed). Most important, of course, is the glyph’s shape, which is provided in a format specific to a rendering engine. Common formats include bitmaps (X11 BDF), Postcript programs (Type 1), and collections of spline curves (TrueType). When the shape is not itself a bitmap, it must be rendered to a pixmap, either a region on the display or a separate object which is copied to the display. In that case, the shape may include “multiple masters” or “hints” to allow context-specific rendering which improves the appearance of the glyph on the display.

Note that this use of “glyph” is mostly independent of the XEmacs LISP glyph API. Glyphs. It is possible to extract a single glyph from a font and encapsulate it in Lisp_Glyph object, but the LISP glyph API allows access to only a very few glyph properties, none of them related to the rendering process.

XEmacs LISP does provide an API for selecting and querying fonts, in the form of a fairly complete set of wrappers for ‘fontconfig’ (see section Modern Font Support – fontconfig). It also provides some control of rendering of text via wrappers for ‘Xft’ APIs (see section Modern Font Support – fontconfig), but this API is quite incomplete. Also, since the font selection and query facilities of ‘Xft’ are provided by ‘fontconfig’, there is some confusion in the API. For example, use of antialiasing to improve the appearance of rendered glyphs can be enabled or disabled. The API for this is to set the ‘fontconfig’ font property antialias on the font. However, from the point of view of ‘fontconfig’ this is merely a hint that the rendering engine may or may not respect. This property cannot be used to select only fonts suitable for being antialiased, for example. And rgba (subpixel geometry) and dpi (pixel density) are conceptually properties of the display, not of either the font. They function as hints to the rendering process.

As a final confusing touch, ‘Xft’ also provides some access to the ‘XRender’ extension provided by some modern X servers. This is mostly limited to colors, but rectangle APIs are also provided. These are (of course) completely independent of fonts, but ‘Xft’ is designed for client-side font rendering, and thus uses the ‘XRender’ extension heavily.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.24.5.2 Modern Font Support – fontconfig

Implementation notes: The functions which initialize the library and handle memory management (e.g., FcInit and FcPatternDestroy) are intentionally not wrapped (in the latter case, fc-pattern-destroy was provided, but this was ill-considered and will be removed; LISP code should never call this function). Thinking about some of the auxiliary constructs used by ‘fontconfig’ is in transition. The FcObjectSet API has been internalized; it is exposed to LISP as a list of strings. The FcFontSet API is still in use, but it also will be internalized, probably as a list (alternatively, vector) of Lisp_fc_pattern objects. Changing the representation of ‘fontconfig’ objects (property names) from LISP strings to keywords is under consideration.

If ‘Xft’ (including ‘fontconfig’) support is integrated into the XEmacs build, XEmacs provides the symbol xft at initialization.

XEmacs provides the following functions wrapping the ‘fontconfig’ library API.

Function: fc-fontset-p object

Returns t if object is of type fc-fontset, nil otherwise. This API is likely to be removed in the near future.

Function: fc-fontset-count fcfontset

Counts the number of fc pattern objects stored in the fc fontset object fcfontset. This API is likely to be removed in the near future.

Function: fc-fontset-ref fcfontset i

Return the fc pattern object at index i in fc fontset object fcfontset. Return nil if the index exceeds the bounds of fcfontset. This API is likely to be removed in the near future.

Function: fc-fontset-destroy fcfontset

Explicitly deallocate fcfontset. Do not call this function from LISP code. You will crash. This API will be removed in the near future.

Function: fc-pattern-p object

Returns t if object is of type fc-pattern, nil otherwise.

Function: fc-pattern-create

Return a fresh and empty fc-pattern object.

Function: fc-name-parse name

Parse string name as a fontconfig font name and return its representation as a fc pattern object.

Function: fc-name-unparse pattern

Unparse pattern object pattern to a string.

Xft’’s similar function is actually a different API. We provide both for now. (They probably invoke the same code from ‘fontconfig’ internally, but the ‘fontconfig’ implementation is more conveniently called from C.)

Function: xft-name-unparse pattern

Unparse pattern object pattern to a string (using the ‘Xft’ API).

Function: fc-pattern-duplicate pattern

Make a copy of pattern object pattern and return it.

Function: fc-pattern-add pattern property value

Add attributes to the pattern object pattern. property is a string naming the attribute to add, value the value for this attribute.

value may be a string, integer, float, or symbol, in which case the value will be added as an FcChar8[], int, double, or FcBool respectively.

Function: fc-pattern-del pattern, property

Remove attribute property from pattern object pattern.

This is the generic interface to FcPatternGet. We don’t support the losing symbol-for-property interface. However, it might be a very good idea to use keywords for property names in LISP.

Function: fc-pattern-get pattern property &optional id type

From pattern, extract property for the id’th member, of type type.

pattern is an ‘Xft’ (‘fontconfig’) pattern object. property is a string naming a ‘fontconfig’ font property. Optional id is a nonnegative integer indexing the list of values for property stored in pattern, defaulting to 0 (the first value). Optional type is a symbol, one of ’string, ’boolean, ’integer, ’float, ’double, ’matrix, ’charset, or ’void, corresponding to the FcValue types. (’float is an alias for ’double).

Symbols with names of the form ‘fc-result-DESCRIPTION’ are returned when the desired value is not available. These are

 
fc-result-type-mismatch   the value found has an unexpected type
fc-result-no-match        there is no such attribute
fc-result-no-id           there is no value for the requested ID

The Lisp types returned will conform to type:

 
string          string
boolean         `t' or `nil'
integer         integer
double (float)  float
matrix          not implemented
charset         not implemented
void            not implemented

The types of the following standard properties are predefined by fontconfig. The symbol ’fc-result-type-mismatch will be returned if the object exists but type does not match the predefined type. It is best not to specify a type for predefined properties, as a mistake here ensures error returns on the correct type.

Each standard property has a convenience accessor defined in ‘fontconfig.el’, named in the form ‘fc-pattern-get-property’. The convenience functions are preferred to fc-pattern-get since a typo in the string naming a property will result in a silent null return, while a typo in a function name will usually result in a compiler or runtime \"not fboundp\" error. You may use defsubst to define convenience functions for non-standard properties.

 
family         String  Font family name 
style          String  Font style. Overrides weight and slant 
slant          Int     Italic, oblique or roman 
weight         Int     Light, medium, demibold, bold or black 
size           Double  Point size 
aspect         Double  Stretches glyphs horizontally before hinting 
pixelsize      Double  Pixel size 
spacing        Int     Proportional, monospace or charcell 
foundry        String  Font foundry name 
antialias      Bool    Whether glyphs can be antialiased 
hinting        Bool    Whether the rasterizer should use hinting 
verticallayout Bool    Use vertical layout 
autohint       Bool    Use autohinter instead of normal hinter 
globaladvance  Bool    Use font global advance data 
file           String  The filename holding the font 
index          Int     The index of the font within the file 
ftface         FT_Face Use the specified FreeType face object 
rasterizer     String  Which rasterizer is in use 
outline        Bool    Whether the glyphs are outlines 
scalable       Bool    Whether glyphs can be scaled 
scale          Double  Scale factor for point->pixel conversions 
dpi            Double  Target dots per inch 
rgba           Int     unknown, rgb, bgr, vrgb, vbgr, none - subpixel geometry 
minspace       Bool    Eliminate leading from line spacing 
charset        CharSet Unicode chars encoded by the font 
lang           String  List of RFC-3066-style languages this font supports

The FT_Face, Matrix, CharSet types are unimplemented, so the corresponding properties are not accessible from Lisp at this time. If the value of a property returned has type FT_Face, FcCharSet, or FcMatrix, fc-result-type-mismatch is returned.

The following properties which were standard in ‘Xft’ v.1 are obsolete in ‘Xft’ v.2: encoding, charwidth, charheight, core, and render.

Function: fc-pattern-destroy pattern

Explicitly deallocate pattern object pattern. Do not call this function from LISP code. You will crash. This API will be removed in the near future.

Function: fc-font-match device pattern

Return the font on device that most closely matches pattern.

pattern is a ‘fontconfig’ pattern object. device is an X11 device. Returns a ‘fontconfig’ pattern object representing the closest match to the given pattern, or an error code. Possible error codes are fc-result-no-match and fc-result-no-id.

Function: fc-list-fonts-pattern-objects device pattern properties

List the fonts on device that match pattern for properties. device is an X11 device. pattern is a ‘fontconfig’ pattern to be matched. properties is the list of property names (strings) that should be included in each returned pattern. The result is a ‘fontconfig’ fontset object containing the set of unique matching patterns.

The properties argument does not affect the matching. So, for example,

 
(mapcar #'fc-name-unparse
  (let ((xfl (fc-list-fonts-pattern-objects nil
              (fc-name-parse "FreeMono") '("style")))
        (i 0)
        (fl nil))
    (while (< i (fc-fontset-count xfl))
      (push (fc-fontset-ref xfl i) fl)
      (setq i (1+ i)))
    fl))

will return something like ‘(":style=Bold" ":style=Medium" ":style=Oblique" ":style=BoldOblique")’ if you have the FreeFont package installed. Note that the sets of objects in the target pattern and the returned patterns don’t even intersect.

In using fc-list-fonts-pattern-objects, be careful that only intrinsic properties of fonts be included in the pattern. Those properties included in the pattern must be matched, or the candidate font will be eliminated from the list. When a font leaves a property unspecified, it is considered to be a mismatch for any pattern with that property specified. Thus, inclusion of extraneous properties will result in the list being empty. Note that for scalable fonts (at least), size is not an intrinsic property! Thus a specification such as "Bitstream Vera Sans-12" will return an empty list regardless of whether the font is available or not—probably not what you (as programmer or user) want.

The list is unsorted. In particular, the pattern ":style=italic,oblique" will not return italic fonts first, then oblique ones. The fonts will be returned in some arbitrary order.

Implementation notes: Fontset objects are slated for removal from the API. In the future fc-list-fonts-pattern-objects will return a list. The device argument is unused, ignored, and may be removed if it’s not needed to match other font-listing APIs. This name will be changed to correspond to Ben’s new nomenclature, probably simply fc-font-list.

Function: fc-font-sort device pattern trim

Return a fontset object listing all fonts sorted by proximity to pattern. device is an X11 device. pattern is a fontconfig pattern to be matched. Optional argument trim, if non-nil, means to trim trailing fonts that do not contribute new characters to the union repertoire.

Implementation notes: Fontset objects are slated for removal from the API. In the future fc-font-sort will return a list (or perhaps a vector) of FcPatterns. The device argument is unused, ignored, and may be removed if it’s not needed to match other font-listing APIs.

Function: fc-font-real-pattern fontname xdevice

Temporarily open font fontname (a string) on device xdevice and return the actual fc pattern matched by the Fc library. This function doesn’t make much sense and will be removed from the API.

Function: xlfd-font-name-p fontname

Check whether string fontname is a XLFD font name.

Variable: xft-debug-level

Level of debugging messages to issue to stderr for Xft. A nonnegative integer. Set to 0 to suppress all warnings. Default is 1 to ensure a minimum of debugging output at initialization. Higher levels give more information.

Variable: xft-version

The major version number of the Xft library compiled with.

Variable: xft-xlfd-font-regexp

Regular expression matching XLFD font names.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

43.24.5.3 Modern Font Support – fontconfig

IIRC, we don’t really provide any ‘Xft’ APIs at the LISP level yet.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Aidan Kehoe on December 27, 2016 using texi2html 1.82.