45. Old Future Work

This chapter includes proposals for future work that were later implemented. These proposals are included because they may describe to some extent the actual workings of the implemented code, and because they may discuss relevant design issues, alternative implementations, or work still to be done.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

45.1 Old Future Work – A Portable Unexec Replacement

Author: Ben Wing

Abstract: Currently, during the build stage of XEmacs, a bare version of the program (called temacs) is run, which loads up a bunch of Lisp data and then writes out a modified executable file. This process is very tricky to implement and highly system-dependent. It can be replaced by a simple, mostly portable, and easy to implement scheme where the Lisp data is written out to a separate data file.

The scheme makes only three assumptions about the memory layout of a running XEmacs process, which, as far as I know, are met by all current implementations of XEmacs (and they’re also requirements of the existing unexec scheme):

The initialized data segments of the various XEmacs modules are all laid out contiguously in memory and are separated from the initialized data segments of libraries that are linked with XEmacs; likewise for uninitialized data segments.
The beginning and end of the XEmacs portion of the combined initialized data segment can be programmatically determined; likewise for the uninitialized data segment.
The XEmacs portion of the initialized and uninitialized data segments are always loaded at the same place in memory.

Assumption number three means that this scheme is non-relocatable, which is a disadvantage as compared to other, relocatable schemes that have been proposed. However, the advantage of this scheme over them is that it is much easier to implement and requires minimal changes to the XEmacs code base.

First, let’s go over the theory behind the dumping mechanism. The principles that we would like to follow are:

We write out to disk all of the data structures and all of their sub-structures that we have created ourselves, except for data that is expected to change from invocation to invocation (in particular, data that is extracted from the external environment at run time).
We don’t write out to disk any data structures created or initialized by system libraries, by the kernel or by any other code that we didn’t create ourselves, because we can’t count on that code working in the way that we want it to.
At the beginning of the next invocation of our program, we read in all those data structures that we have written out to disk, and then continue as if we had just created and initialized all of that data ourselves.
We make sure that our own data structures don’t have any pointers to system data, or if they do, that we note all of these pointers so that we can re-create the system data and set up pointers to the data again in the next invocation.
During the next invocation of our program, we re-create all of our own data structures that are derived from the external environment.

XEmacs, of course, is already set up to adhere to most of these principles.

In fact, the current dumping process that we are replacing does a few of these principles slightly differently and adds a few extra of its own:

All data structures of all sorts, including system data, are written out. This is the cause of no end of problems, and it is avoidable, because we can ensure that our own data and the system data are physically separated in memory.
Our own data structures that we derive from the external environment are in fact written out and read in, but then are simply overwritten during the next invocation with new data. Before dumping, we make sure to free any such data structure that would cause memory leaks.
XEmacs carefully arranges things so that all static variables in the initialized data are never written to after the dumping stage has completed. This allows for an additional optimization in which we can make static initialized data segments in pre-dumped invocations of XEmacs be read-only and shared among all XEmacs processes on a single machine.

The difficult part in this process is figuring out where our data structures lie in memory so that we can correctly write them out and read them back in. The trick that we use to make this problem solvable is to ensure that the heap that is used for all dynamically allocated data structures that are created during the dumping process is located inside the memory of a large, statically declared array. This ensures that all of our own data structures are contained (at least at the time that we dump out our data) inside the static initialized and uninitialized data segments, which are physically separated in memory from any data treated by system libraries and whose starting and ending points are known and unchanging (we know that all of these things are true because we require them to be so, as preconditions of being able to make use of this method of dumping).

In order to implement this method of heap allocation, we change the memory allocation function that we use for our own data. (It’s extremely important that this function not be used to allocate system data. This means that we must not redefine the malloc function using the linker, but instead we need to achieve this using the C preprocessor, or by simply using a different name, such as xmalloc. It’s also very important that we use the correct free function when freeing dynamically-allocated data, depending on whether this data was allocated by us or by the

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

45.2 Old Future Work – Indirect Buffers

Author: Ben Wing

An indirect buffer is a buffer that shares its text with some other buffer, but has its own version of all of the buffer properties, including markers, extents, buffer local variables, etc. Indirect buffers are not currently implemented in XEmacs, but they are in GNU Emacs, and some people have asked for this feature. I consider this feature somewhat extent-related because much of the work required to implement this feature involves tracking extents properly.

In a world with indirect buffers, some buffers are direct, and some buffers are indirect. This only matters when there is more than one buffer sharing the same text. In such a case, one of the buffers can be considered the canonical buffer for the text in question. This buffer is a direct buffer, and all buffers sharing the text are indirect buffers. These two kinds of buffers are created differently. One of them is created simply using the make_buffer() function (or perhaps the Fget_buffer_create() function), and the other kind is created using the make_indirect_buffer() function, which takes another buffer as an argument which specifies the text of the indirect buffer being created. Every indirect buffer keeps track of the direct buffer that is its parent, and every direct buffer keeps a list of all of its indirect buffer children. This list is modified as buffers are created and deleted. Because buffers are permanent objects, there is no special garbage collection-related trickery involved in these parent and children pointers. There should never be an indirect buffer whose parent is also an indirect buffer. If the user attempts to set up such a situation using make_indirect_buffer(), either an error should be signaled or the parent of the indirect buffer should automatically become the direct buffer that actually is responsible for the text. Deleting a direct buffer should perhaps cause all of the indirect buffer children to be deleted automatically. There should be Lisp functions for determining whether a buffer is direct or indirect, and other functions for retrieving the parents, or the children of the buffer, depending on which is appropriate. (The scheme being described here is similar to symbolic links. Another possible scheme would be analogous to hard links, and would make no distinction between direct and indirect buffers. In that case, the text of the buffer logically exists as an object separate from the buffer itself and only goes away when the last buffer pointing to this text is deleted.)

Other than keeping track of parent and child pointer, the only remaining thing required to implement indirect buffers is to ensure that changes to the text of the buffer trigger the same sorts of effect in all the buffers that share that text. Luckily there are only three functions in XEmacs that actually make changes to the text of the buffer, and they are all located in the file insdel.c.

These three functions are called buffer_insert_string_1(), buffer_delete_range(), and buffer_replace_char(). All of the subfunctions called by these functions are also in insdel.c.

The first thing that each of these three functions needs to do is check to see if its buffer argument is an indirect buffer, and if so, convert it to the indirect buffer’s parent. Once that is done, the functions need to be modified so that all of the things they do, other than actually changing the buffers text, such as calling before-change-functions and after-change-functions, and updating extents and markers, need to be done over all of the buffers that are indirect children of the buffers being modified; as well as, of course, for the buffer itself. Each step in the process needs to be iterated for all of the buffers in question before proceeding to the next step. For example, in buffer_insert_string_1(), prepare_to_modify_buffer() needs to be called in turn, for all of the buffers sharing the text being modified. Then the text itself is modified, then insert_invalidate_line_number_cache() is called for all of the buffers, then record_insert() is called for all of the buffers, etc. Essentially, the operation is being done on all of the buffers in parallel, rather than each buffer being processed in series. This is necessary because many of the steps can quit or call Lisp code and each step depends on the previous step, and some steps are done only once, rather than on each buffer. I imagine it would be significantly easier to implement this, if a macro were created for iterating over a buffer, and then all of the indirect children of that buffer.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

45.3 Old Future Work – Improvements in support for non-ASCII (European) keysyms under X

Author: Martin Buchholz

If a user has a keyboard with known standard non-ASCII character equivalents, typically for European users, then Emacs’ default binding should be self-insert-command, with the obvious character inserted. For example, if a user has a keyboard with

xmodmap -e "keycode 54 = scaron"

then pressing that key on the keyboard will insert the (Latin-2) character corresponding to "scaron" into the buffer.

Note: Emacs 20.6 does NOTHING when pressing such a key (not even an error), i.e. even (read-event) ignores this key, which means it can’t even be bound to anything by a user trying to customize it.

This is implemented by maintaining a table of translations between all the known X keysym names and the corresponding (charset, octet) pairs.

For every key on the keyboard that has a known character correspondence, we define the character-of-keysym property of the keysym, and make the default binding for the key be self-insert-command.

The following magic is basically intimate knowledge of X11/keysymdef.h. The keysym mappings defined by X11 are based on the iso8859 standards, except for Cyrillic and Greek.

In a non-Mule world, a user can still have a multi-lingual editor, by doing (set-face-font "...-iso8859-2" (current-buffer)) for all their Latin-2 buffers, etc.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

45.4 Old Future Work – RTF Clipboard Support

Author: Ben Wing

in fact, i merged the windows stuff with the already-existing generic code.

what i’d like to see is something like this:

The current function

(defun own-selection (data &optional type append)

should become

(defun own-selection (data &optional type how-to-add data-type)

where data-type is the mswindows format, and how-to-add is

'replace-all or nil -- remove data for all formats
'replace-existing -- remove data for DATA-TYPE, but leave other formats alone
'append or t -- append data to existing data in DATA-TYPE, and leave other
formats alone

the function
(get-selection &optional TYPE DATA-TYPE)
already has a data-type so you don’t need to change it.

the existing function

(selection-exists-p &optional SELECTION DEVICE)

should become

(selection-exists-p &optional SELECTION DEVICE DATA-TYPE)

a new function
(register-selection-data-type DATA-TYPE)
like your mswindows-register-clipboard-format.
there’s already a selection-converter-alist, but that’s only for data out. you should alias it to selection-conversion-out-alist, and create selection-conversion-in-alist. these alists contain entries for CF_TEXT, which handles CR/LF conversion, and rtf, which does rtf in/out conversion – no need for separate functions to do this.
this may seem daunting, but it’s much less hard to add stuff like this than it seems, and i and others will certainly give you lots of support if you run into problems. it would be way cool to have a more powerful clipboard mechanism in XEmacs.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

45.5 Old Future Work – xemacs.org Mailing Address Changes

Author: Ben Wing

Personal addresses

Everyone who is contributing or has ever contributed code to the XEmacs core, or to any of the packages archived at xemacs.org, even if they don’t actually have an account on any machine at xemacs.org. In fact, all of these people should have two mailing addresses at xemacs.org, one of which is their actual login name (or potential login name if they were ever to have an account), and the other one is in the form of first name/last name, similar to the way things are done at Sun. For example, Martin would have two addresses at xemacs.org, martin@xemacs.org, and martin.buchholz@xemacs.org, with the latter one simply being an alias for the former. The idea is that in all cases, if you simply know the name of any past or present contributor to XEmacs, and you want to mail them, you will know immediately how to do this without having to do any complicated searching on the Web or in XEmacs documentation.
Furthermore, I think that all of the email addresses mentioned anywhere in the XEmacs source code or documentation should be changed to be the corresponding ones at xemacs.org, instead of any other email addresses that any contributors might have.
All the places in the source code where a contributor’s name is mentioned, but no email addressed is attached, should be found, and the correct xemacs.org address should be attached.
The alias file mapping people’s addresses at xemacs.org to their actual addresses elsewhere (in the case, as will be true for the majority of addresses, where the contributor does not actually have an account at xemacs.org, but simply a forwarding pointer), should be viewable on the xemacs.org web site through a CGI script that reads the alias file and turns it into an HTML table.

Package addresses

I also think that for every package archived at xemacs.org, there should be three corresponding email addresses at xemacs.org. For example, consider a package such as lazy-shot. The addresses associated with this package would be:

lazy-shot@xemacs.org: This is a discussion mailing list about the lazy-shot package, and it should be controlled by Majordomo in the standard fashion.
lazy-shot-patches@xemacs.org: This is where patches to the lazy-shot package are set. This should go to various people who are interested in such patches. For example, the maintainer of lazy-shot, perhaps the maintainer of XEmacs itself, and probably to other people who have volunteered to do code review for this package, or for a larger group of packages that this package is in. Perhaps this list should also be maintained by Majordomo.
lazy-shot-maintainer@xemacs.org: This address is for mailing the maintainer directly. It is possible that this will go to more than one person. This would particularly be the case, for example, if the maintainer is dormant or does not appear very responsive to patches. In this case, the address would also point to someone like Steve, who is acting in the maintainer’s stead, and who will himself apply patches or make other changes to the package as maintained in the CVS archive on xemacs.org.

It may take a bit of work to track down the current addresses for the various package maintainers, and may in general seem like a lot of work to set up all of these mail addresses, but I think it’s very important to make it as easy as possible for random XEmacs users to be able to submit patches and report bugs in an orderly fashion. The general idea that I’m striving for is to create as much momentum as possible in the XEmacs development community, and I think having the system of mail addresses set up will make it much easier for this momentum to be built up and to remain.

Ben Wing

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

45.6 Old Future Work – Lisp callbacks from critical areas of the C code

Author: Ben Wing

There are many places in the XEmacs C code where Lisp functions are called, usually because the Lisp function is acting as a callback, hook, process filter, or the like. The lisp code is often called in places where some lisp operations are dangerous. Currently there are a lot of ad-hoc schemes implemented to try to prevent these dangerous operations from causing problems. I’ve added a lot of them myself, for example, the call*_trapping_errors() functions. Other places, such as the pre-gc- and post-gc-hooks, do their own ad hoc processing. I’m proposing a scheme that would generalize all of this ad hoc code and allow Lisp code to be called in all sorts of sensitive areas of the C code, including even within redisplay.

Basically, we define a set of operations that are disallowable because they are dangerous. We essentially assign a bit flag to all of these operations. Whenever any sensitive C code wants to call Lisp code, instead of using the standard call* functions, it uses a new set of functions, call*_critical, which takes an extra parameter, which is a bit mask specifying the set of operations which are disallowed. The basic operations of these functions is simply to set a global variable corresponding to the bit mask (more specifically, the functions store the previous value of this global variable in an unwind_protect, and use bitwise-or to combine the previous value with the new bit mask that was passed in). (Actually, we should first implement a slightly lower level function which is called enter_sensitive_code_section(), which simply sets up the global variable and the unwind_protect(), and returns a specbind() value, but doesn’t actually call any Lisp code. There is a corresponding function exit_sensitive_code_section(), which takes the specbind value as an argument, and unwinds the unwind_protect. The call*_sensitive functions are trivially implemented in terms of these lower level functions.)

Corresponding to each of these entries is the C name of the bit flag.

The sets of dangerous operations which can be prohibited are:

OPERATION_GC_PROHIBITED: garbage collection. When this flag is set, and the garbage collection threshold is reached, garbage collection simply doesn’t happen. It will happen at the next opportunity that it is allowed. Similarly, explicitly calling the Lisp function garbage-collect simply does nothing.
OPERATION_CATCH_ERRORS: signalling an error. When enter_sensitive_code_section() is called, with the bit flag corresponding to this prohibited operation. When this bit flag is passed to enter_sensitive_code_section(), a catch is set up which catches all errors, signals a warning with warn_when_safe(), and then simply continues. This is exactly the same behavior you now get with the call_*_trapping_errors() functions. (there should also be some way of specifying a warning level and class here, similar to the call_*_trapping_errors() functions. This is not completely important, however, because a standard warning level and class could simply be chosen.)
OPERATION_NO_UNSAFE_OBJECT_DELETION: This flag prohibits deletion of any permanent object (i.e. any object that does not automatically disappear when created, such as buffers, frames, devices, windows, etc...) unless they were created after this bit flag was set. This would be implemented using a list which stores all of the permanent objects created after this bit flag was set. This list is reset to its previous value when the call to exit_sensitive_code_section() occurs. The motivation here is to allow Lisp callbacks to create their own temporary buffers or frames, and later delete them, but not allow any other permanent objects to be deleted, because C code might be working with them, and not expect them to change.
OPERATION_NO_BUFFER_MODIFICATION: This flag disallows modifications to the text, extent or any other properties of any buffers except those created after this flag was set, just like in the previous entry.
OPERATION_NO_REDISPLAY: This bit flag inhibits any redisplay-related operations from happening, more specifically, any entry into the redisplay-related code. This includes, for example, the Lisp functions sit-for, force-redisplay, force-cursor-redisplay, window-end with certain arguments to it, and various other functions. When this flag is set, instead of entering the redisplay code, the calling function should simply make sure not to enter the redisplay code, (for example, in the case of window-end), or postpone the redisplay until such a time when it’s safe (for example, with sit-for and force-redisplay).
OPERATION_NO_REDISPLAY_SETTINGS_CHANGE: This flag prohibits any modifications to faces, glyphs, specifiers, extents, or any other settings that will affect the way that any window is displayed.

The idea here is that it will finally be safe to call Lisp code from nearly any part of the C code, simply by setting any combination of restricted operation bit flags. This even includes from within redisplay. (in such a case, all of the bit flags need to be set). The reason that I thought of this is that some coding system translations might cause Lisp code to be invoked and C code often invokes these translations in sensitive places.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Aidan Kehoe on December 27, 2016 using texi2html 1.82.