Extent Related Changes

Owner: ???

Effort: ???

Dependencies: ???

Abstract: This page describes various changes that could be made that are at least somewhat related to extents. I don't consider any of these projects terribly important, but I'm putting my ideas down in case someone is really interested in one of them.

Indirect Buffers

An indirect buffer is a buffer that shares its text with some other buffer, but has its own version of all of the buffer properties, including markers, extents, buffer local variables, etc. Indirect buffers are not currently implemented in XEmacs, but they are in GNU Emacs, and some people have asked for this feature. I consider this feature somewhat extent-related because much of the work required to implement this feature involves tracking extents properly.

In a world with indirect buffers, some buffers are direct, and some buffers are indirect. This only matters when there is more than one buffer sharing the same text. In such a case, one of the buffers can be considered the canonical buffer for the text in question. This buffer is a direct buffer, and all buffers sharing the text are indirect buffers. These two kinds of buffers are created differently. One of them is created simply using the make_buffer() function (or perhaps the Fget_buffer_create() function), and the other kind is created using the make_indirect_buffer() function, which takes another buffer as an argument which specifies the text of the indirect buffer being created. Every indirect buffer keeps track of the direct buffer that is its parent, and every direct buffer keeps a list of all of its indirect buffer children. This list is modified as buffers are created and deleted. Because buffers are permanent objects, there is no special garbage collection-related trickery involved in these parent and children pointers. There should never be an indirect buffer whose parent is also an indirect buffer. If the user attempts to set up such a situation using make_indirect_buffer(), either an error should be signaled or the parent of the indirect buffer should automatically become the direct buffer that actually is responsible for the text. Deleting a direct buffer should perhaps cause all of the indirect buffer children to be deleted automatically. There should be Lisp functions for determining whether a buffer is direct or indirect, and other functions for retrieving the parents, or the children of the buffer, depending on which is appropriate. (The scheme being described here is similar to symbolic links. Another possible scheme would be analogous to hard links, and would make no distinction between direct and indirect buffers. In that case, the text of the buffer logically exists as an object separate from the buffer itself and only goes away when the last buffer pointing to this text is deleted.)

Other than keeping track of parent and child pointer, the only remaining thing required to implement indirect buffers is to ensure that changes to the text of the buffer trigger the same sorts of effect in all the buffers that share that text. Luckily there are only three functions in XEmacs that actually make changes to the text of the buffer, and they are all located in the file insdel.c.

These three functions are called buffer_insert_string_1(), buffer_delete_range(), and buffer_replace_char(). All of the subfunctions called by these functions are also in insdel.c.

The first thing that each of these three functions needs to do is check to see if its buffer argument is an indirect buffer, and if so, convert it to the indirect buffer's parent. Once that is done, the functions need to be modified so that all of the things they do, other than actually changing the buffers text, such as calling before-change-functions and after-change-functions, and updating extents and markers, need to be done over all of the buffers that are indirect children of the buffers being modified; as well as, of course, for the buffer itself. Each step in the process needs to be iterated for all of the buffers in question before proceeding to the next step. For example, in buffer_insert_string_1(), prepare_to_modify_buffer() needs to be called in turn, for all of the buffers sharing the text being modified. Then the text itself is modified, then insert_invalidate_line_number_cache() is called for all of the buffers, then record_insert() is called for all of the buffers, etc. Essentially, the operation is being done on all of the buffers in parallel, rather than each buffer being processed in series. This is necessary because many of the steps can quit or call Lisp code and each step depends on the previous step, and some steps are done only once, rather than on each buffer. I imagine it would be significantly easier to implement this, if a macro were created for iterating over a buffer, and then all of the indirect children of that buffer.

Everything should obey duplicable extents

A lot of functions don't properly track duplicable extents. For example, the concat function does, but the format function does not, and extents in keymap prompts are not displayed either. All of the functions that generate strings or string-like entities should track the extents that are associated with the strings. Currently this is difficult because there is no general mechanism implemented for doing this. I propose such a general mechanism, which would not be hard to implement, and would be easy to use in other functions that build up strings.

The basic idea is that we create a C structure that is analogous to a Lisp string in that it contains string data and lists of extents for that data. Unlike standard Lisp strings, however, this structure (let's call it lisp_string_struct) can be incrementally updated and its allocation is handled explicitly so that no garbage is generated. (This is important for example, in the event-handling code which would want to use this structure, but needs to not generate any garbage for efficiency reasons). Both the string data and the list of extents in this string are handled using dynarrs so that it is easy to incrementally update this structure. Functions should exist to create and destroy instances of lisp_string_struct to generate a Lisp string from a lisp_string_struct and vice-versa to append a sub-string of a Lisp string to a lisp_string_struct, to just append characters to a lisp_string_struct, etc. The only thing possibly tricky about implementing these functions is implementing the copying of extents from a Lisp string into a lisp_string_struct. However, there is already a function copy_string_extents() that does basically this exact thing, and it should be easy to create a modified version of this function.

Ben Wing

Conform with <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Automatically validated by PSGML