|  | Searching XEmacs
	    
            Quick Links
            
            About XEmacs
            
            Getting XEmacs
            
            Customizing XEmacs
            
            Troubleshooting XEmacs
            
            Developing XEmacs |  |  | A Portable Unexec ReplacementOwner:  ??? Effort:  ??? Dependencies:  ??? Abstract: Currently, during the build stage of XEmacs, a
              bare version of the program (called temacs) is run, which
              loads up a bunch of Lisp data and then writes out a modified
              executable file.  This process is very tricky to implement and highly
              system-dependent.  It can be replaced by a simple, mostly portable,
              and easy to implement scheme where the Lisp data is written out to a
              separate data file. The scheme makes only three assumptions about the memory layout of
              a running XEmacs process, which, as far as I know, are met by all
              current implementations of XEmacs (and they're also requirements of the
              existing unexec scheme): 
	    The initialized data segments of the various XEmacs modules are
                  all laid out contiguously in memory and are separated from the
                  initialized data segments of libraries that are linked with XEmacs;
                  likewise for uninitialized data segments.The beginning and end of the XEmacs portion of the combined
                  initialized data segment can be programmatically determined; likewise
                  for the uninitialized data segment.The XEmacs portion of the initialized and uninitialized data
                  segments are always loaded at the same place in memory. Assumption number three means that this scheme is non-relocatable,
              which is a disadvantage as compared to other, relocatable schemes that
              have been proposed.  However, the advantage of this scheme over them
              is that it is much easier to implement and requires minimal changes to
              the XEmacs code base. First, let's go over the theory behind the dumping mechanism.  The
              principles that we would like to follow are: 
	    We write out to disk all of the data structures and all of their
                  sub-structures that we have created ourselves, except for data that is
                  expected to change from invocation to invocation (in particular, data
                  that is extracted from the external environment at run time).We don't write out to disk any data structures created or
                  initialized by system libraries, by the kernel or by any other code
                  that we didn't create ourselves, because we can't count on that code
                  working in the way that we want it to.At the beginning of the next invocation of our program, we read in
                  all those data structures that we have written out to disk, and then
                  continue as if we had just created and initialized all of that data
                  ourselves.We make sure that our own data structures don't have any pointers
                  to system data, or if they do, that we note all of these pointers so
                  that we can re-create the system data and set up pointers to the data
                  again in the next invocation.During the next invocation of our program, we re-create all of our
                  own data structures that are derived from the external environment. XEmacs, of course, is already set up to adhere to most of these
              principles. In fact, the current dumping process that we are replacing does a
              few of these principles slightly differently and adds a few extra of
              its own: 
	    All data structures of all sorts, including system data, are
                  written out.  This is the cause of no end of problems, and it is
                  avoidable, because we can ensure that our own data and the system data
                  are physically separated in memory.Our own data structures that we derive from the external
                  environment are in fact written out and read in, but then are simply
                  overwritten during the next invocation with new data.  Before dumping,
                  we make sure to free any such data structure that would cause memory
                  leaks.XEmacs carefully arranges things so that all static variables in
                  the initialized data are never written to after the dumping
                  stage has completed.  This allows for an additional optimization in
                  which we can make static initialized data segments in pre-dumped
                  invocations of XEmacs be read-only and shared among all XEmacs
                  processes on a single machine. The difficult part in this process is figuring out where our data
              structures lie in memory so that we can correctly write them out and
              read them back in.  The trick that we use to make this problem
              solvable is to ensure that the heap that is used for all dynamically
              allocated data structures that are created during the dumping process
              is located inside the memory of a large, statically declared array.
              This ensures that all of our own data structures are contained (at
              least at the time that we dump out our data) inside the static
              initialized and uninitialized data segments, which are physically
              separated in memory from any data treated by system libraries and
              whose starting and ending points are known and unchanging (we know
              that all of these things are true because we require them to be so, as
              preconditions of being able to make use of this method of
              dumping). In order to implement this method of heap allocation, we change the
              memory allocation function that we use for our own data.  (It's
              extremely important that this function not be used to allocate system
              data.  This means that we must not redefine the mallocfunction using the linker, but instead we need to achieve this using
              the C preprocessor, or by simply using a different name, such asxmalloc.  It's also very important that we use the
              correctfreefunction when freeing dynamically-allocated
              data, depending on whether this data was allocated by us or by the
              system.  If we don't keep this straight, we are likely to corrupt
              memory and cause XEmacs to crash.)  What our own memory allocation
              function does is, depending on the circumstances, either call our
              own memory allocation subfunction (probably based on the routines ingmalloc.c), which allocates memory out of a virtual heap
              that we have set up using a large statically-declared array, or simply
              calls the standardmallocfunction to do the memory
              allocation.  Similarly, thefreefunction that we use
              either calls our own free subfunction or calls the standard one.  (In
              this case, it's clear which of the two subfunctions we use.  We just
              look at the pointer that was given to us, and see if it's within our
              large static array or not).  The rules governing which of the two
              allocation subfunctions is used are as follows: 
	    We always use our own allocation subfunction until the first time
                  that it fails.If this failure occurs during the dumping stage, we abort with an
                  error that we need to increase the size of our static heap.  (The
                  static heap needs to be large enough to hold all of the data that we
                  allocate during the dumping phase, but not much larger, so that we
                  don't waste memory or disk space.  A static heap is currently used in
                  the Cygwin version of XEmacs, and we can probably adapt many of the
                  routines that are used for this.)Otherwise, after the first failure of our own allocation
                  subfunction, we switch to using the standard mallocfunction from then on.  (Alternatively, we could always call our own
                  allocation subfunction and then call the standard one whenever our own
                  one fails.  This would use memory more efficiently, but would be
                  slower.  Another alternative that avoids this trade-off but
                  constricts the choice of allocation methods that we can use is to
                  scrap this two-mode allocation scheme entirely and simply provide an
                  allocation function that can cope with having its heap be in two
                  non-contiguous areas of memory.  I think that the routines ingmalloc.ccan deal with this, for example). When it's time to dump out our data, we don't have to do anything
              complicated involving creating a new executable file like we do
              currently.  All we have to do is write out the data contained in our
              uninitialized and initialized data segments to a data file.  At the
              beginning of main, the first thing we do is check to see
              whether we are running astemacsor asxemacs.  If we are running asxemacs, then
              the first thing we do is locate our data file, which should probably
              be namedxemacs.dat, and be located in the same directory
              as thexemacsexecutable.  Then we load in the data from
              this data file, overwriting our initialized and uninitialized data
              segments, and continue with XEmacs as normal.  (There is no danger in
              overwriting things like this because this is the first or almost the
              very first thing that we do, and we're not going to be overwriting any
              system data that might have been created or initialized beforemainwas called.  We have to be careful, however, with
              the small number of variables that we initialized in the process of
              determining whether we should load our data file and then loading this
              data file.) I think that the way we determine whether we are running as
              temacsorxemacsis: 
	    If our executable name begins with temacs, we are
                  running as temacs.If our data file doesn't exist, we are running as
                  temacs.If the first command line option is something like
                  -no-data-file, we are running astemacs. In all of the other circumstances, we load the data file normally
              and proceed as if this were a normal xemacsinvocation. We can do a further optimization because of the clever way that
              XEmacs arranges to never write to any variables that exist in the
              initialized static data segment after the dump phase.  When we read in
              the initialized data segment, instead of reading it in normally using
              the readsystem call, we usemmapif it is
              available.  In the call tommap, we specify the start of
              the initialized data segment as the first argument, and then we
              specify the flagsMAP_FIXEDandMAP_SHARED.
              This way, the initialized data segment will be read-only and shared
              among all XEmacs processes on the same machine. (When reading in the
              uninitialized data segment, we should probably do a similar thing
              involvingmmap, but use theMAP_PRIVATEflag
              instead ofMAP_SHAREDso that this data segment
              essentially becomes copy-on-write.)  Memory mapping like this can also
              be done on Windows; the function is different frommmap,
              but as far as I know the semantics are equivalent. Ben Wing
 |