[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

40. Interface to MS Windows


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

40.1 Different kinds of Windows environments

(a) operating system (OS) vs. window system vs. Win32 API vs. C runtime library (CRT) vs. and compiler

There are various Windows operating systems (Windows NT, 2000, XP, 95, 98, ME, etc.), which come in two basic classes: Windows NT (NT, 2000, XP, and all future versions) and 9x (95, 98, ME). 9x-class operating systems are a kind of hodgepodge of a 32-bit upper layer on top of a 16-bit MS-DOS-compatible lower layer. NT-class operating systems are written from the ground up as 32-bit (there are also 64-bit versions available now), and provide many more features and much greater stability, since there is full memory protection between all processes and the between processes and the system. NT-class operating systems also provide emulation for DOS programs inside of a "sandbox" (i.e. a walled-off environment in which one DOS program can screw up another one, but there is theoretically no way for a DOS program to screw up the OS itself). From the perspective of XEmacs, the different between NT and 9x is very important in Unicode support (not really provided under 9x – see ‘intl-win32.c’) and subprocess creation, among other things.

The operating system provides the framework for accessing files and devices and running programs. From the perspective of a program, the operating system provides a set of services. At the lowest level, the way to call these services is dependent on the processor the OS is running on, but a portable interface is provided to C programs through functions called "system calls". Under Windows, this interface is called the Win32 API, and includes file-manipulation calls such as CreateFile() and ReadFile(), process-creation calls such as CreateProcess(), etc.

This concept of system calls goes back to Unix, where similar services are available but through routines with different, simpler names, such as open(), read(), fork(), execve(), etc. In addition, Unix provides a higher layer of routines, called the C Runtime Library (CRT), which provide higher-level, more convenient versions of the same services (e.g. "stream-oriented" file routines such as fopen() and fread()) as well as various other utility functions, such as string-manipulation routines (e.g. strcpy() and strcmp()).

For compatibility, a C Runtime Library (CRT) is also provided under Windows, which provides a partial implementation of both the Unix CRT and the Unix system-call API, implemented using the Win32 API. The CRT sources come with Visual C++ (VC++). For example, under VC++ 6, look in the CRT/SRC directory, e.g. for me (ben): /Program Files/Microsoft Visual Studio/VC98/CRT/SRC. The CRT is provided using either MSVCRT (dynamically linked) or ‘LIBC.LIB’ (statically linked).

The window system provides the framework for creating overlapped windows and unifying signals provided by various devices (input devices such as the keyboard and mouse, timers, etc.) into a single event queue (or "message queue", under Windows). Like the operating system, the window system can be viewed from the perspective of a program as a set of services provided by an API of function calls. Under Windows, window-system services are also available through the Win32 API, while under UNIX the window system is typically a separate component (e.g. the X Windowing System, aka X Windows or X11). The term "GUI" ("graphical user interface") is often used to refer to the services provided by the window system, or to a windowing interface provided by a program.

The Win32 API is implemented by various dynamic libraries, or DLL’s. The most important are KERNEL32, USER32, and GDI32. KERNEL32 implements the basic file-system and process services. USER32 implements the fundamental window-system services such as creating windows and handling messages. GDI32 implements higher-level drawing capabilities – fonts, colors, lines, etc.

C programs are compiled into executables using a compiler. Under Unix, a compiler usually comes as part of the operating system, but not under Windows, where the compiler is a separate product. Even under Unix, people often install their own compilers, such as gcc. Under Windows, the Microsoft-standard compiler is Visual C++ (VC++).

It is possible to provide an emulation of any API using any other, as long as the underlying API provides the suitable functionality. This is what Cygwin (www.cygwin.com) does. It provides a fairly complete POSIX emulation layer (POSIX is a government standard for Unix behavior) on top of MS Windows – in particular, providing the file-system, process, tty, and signal semantics that are part of a modern, standard Unix operating system. Cygwin does this using its own DLL, ‘cygwin1.dll’, which makes calls to the Win32 API services in ‘kernel32.dll’. Cygwin also provides its own implementation of the C runtime library, called newlib (‘libcygwin.a’; ‘libc.a’ and ‘libm.a’ are symlinked to it), which is implemented on top of the Unix system calls provided in ‘cygwin1.dll’. In addition, Cygwin provides static import libraries that give you direct access to the Win32 API – XEmacs uses this to provide GUI support under Cygwin. Cygwin provides a version of GCC (the GNU Project C compiler) that is set up to automatically link with the appropriate Cygwin libraries. Cygwin also provides, as optional components, pre-compiled binaries for a great number of open-source programs compiled under the Cygwin environment. This includes all of the standard Unix file-system, text-manipulation, development, networking, database, etc. utilities, a version of X Windows that uses the Win32 API underlyingly (see below), and compilations of nearly all other common open-source packages (Apache, TeX, [X]Emacs, Ghostscript, GTK, ImageMagick, etc.).

Similarly, you can emulate the functionality of X Windows using the Win32 component of the Win32 API. Cygwin provides a package to do this, from the XFree86 project. Other versions of X under Windows also exist, such as the MicroImages MI/X server. Each version potentially can come comes with its own header and library files, allowing you to compile X-Windows programs.

All of these different operating system and emulation layers can make for a fair amount of confusion, so:

(b) CRT is not the same as VC++

Note that the CRT is NOT (completely) part of VC++. True, if you link statically, the CRT (in the form of ‘LIBC.LIB’, which comes with VC++) will be inserted into the executable (.EXE), but otherwise the CRT will be separate. The dynamic version of the CRT is provided by ‘MSVCRT.DLL’ (or ‘MSVCRTD.DLL’, for debugging), which comes with Windows. Hence, it’s possible to use a different compiler and still link with MSVCRT – which is exactly what MinGW does.

(c) CRT is not the same as the Win32 API

Note also that the CRT is totally separate from the Win32 API. They provide different functions and are implemented in different DLL’s. They are also different levels – the CRT is implemented on top of Win32. Sometimes the CRT and Win32 both have their own versions of similar concepts, such as locales. These are typically maintained separately, and can get out of sync. Do not assume that changing a setting in the CRT will have any effect on Win32 API routines using a similar concept unless the CRT docs specifically say so. Do not assume that behavior described for CRT functions applies to Win32 API or vice-versa. Note also that the CRT knows about and is implemented on top of the Win32 API, while the Win32 API knows nothing about the CRT.

(d) MinGW is not the same as Cygwin

As described in (b), Microsoft’s version of the CRT (‘MSVCRT.DLL’) is provided as part of Windows, separate from VC++, which must be purchased. Hence, it is possible to write MSVCRT to provide CRT services without using VC++. This is what MinGW (www.mingw.org) does – it is a port of GCC that will use MSVCRT. The reason one might want to do this is (a) it is free, and (b) it does not require a separately installed DLL, as Cygwin does. (#### Maybe MinGW targets CRTDLL, not MSVCRT? If so, what is CRTDLL, and how does it differ from MSVCRT and ‘LIBC.LIB’?) Primarily, what MinGW provides is patches to GCC (now integrated into the standard distribution) and its own header files and import libraries that are compatible with MSVCRT. The best way to think of MinGW is as simply another Windows compiler, like how there used to be Microsoft and Borland compilers. Because MinGW programs use all the same libraries as VC++ programs, and hence the same services are available, programs that compile under VC++ should compile under MinGW with very little change, whereas programs that compile under Cygwin will look quite different.

The confusion between MinGW and Cygwin is the confusion between the environment that a compiler runs under and the target environment of a program, i.e. the environment that a program is compiled to run under. It’s theoretically possible, for example, to compile a program under Windows and generate a binary that can only be run under Linux, or vice-versa – or, for that matter, to use Windows, running on an Intel machine to write and a compile a program that will run on the Mac OS, running on a PowerPC machine. This is called cross-compiling, and while it may seem rather esoteric, it is quite normal when you want to generate a program for a machine that you cannot develop on – for example, a program that will run on a Palm Pilot. Originally, this is how MinGW worked – you needed to run GCC under a Cygwin environment and give it appropriate flags, telling it to use the MinGW headers and target ‘MSVCRT.DLL’ rather than ‘CYGWIN1.DLL’. (In fact, Cygwin standardly comes with MinGW’s header files.) This was because GCC was written with Unix in mind and relied on a large amount of Unix-specific functionality. To port GCC to Windows without using a POSIX emulation layer would mean a lot of rewriting of GCC. Eventually, however, this was done, and it GCC was itself compiled using MinGW. The result is that currently you can develop MinGW applications either under Cygwin or under native Windows.

(e) Operating system is not the same as window system

As per the above discussion, we can use either Native Windows (the OS part of Win32 provided by ‘KERNEL32.DLL’ and the Windows CRT as provided by MSVCRT or CLL) or Cygwin to provide operating-system functionality, and we can use either Native Windows (the windowing part of Win32 as provided by ‘USER32.DLL’ and ‘GDI32.DLL’) or X11 to provide window-system functionality. This gives us four possible build environments. It’s currently possible to build XEmacs with at least three of these combinations – as far as I know native + X11 is no longer supported, although it used to be (support used to exist in ‘xemacs.mak’ for linking with some X11 libraries available from somewhere, but it was bit-rotting and you could always use Cygwin; #### what happens if we try to compile with MinGW, native OS + X11?). This may still seem confusing, so:

Native OS + native windowing

We call CreateProcess() to run subprocesses (‘process-nt.c’), and CreateWindowEx() to create a top-level window (‘frame-msw.c’). We use ‘nt/xemacs.mak’ to compile with VC++, linking with the Windows CRT (‘MSVCRT.DLL’ or ‘LIBC.LIB’) and with the various Win32 DLL’s (‘KERNEL32.DLL’, ‘USER32.DLL’, ‘GDI32.DLL’); or we use ‘src/Makefile[.in.in]’ to compile with GCC, telling it (e.g. -mno-cygwin, see ‘s/mingw32.h’) to use MinGW (which will end up linking with ‘MSVCRT.DLL’), and linking GCC with -lshell32 -lgdi32 -luser32 etc. (see ‘configure.in’).

Cygwin + native windowing

We call fork()/execve() to run subprocesses (‘process-unix.c’), and CreateWindowEx() to create a top-level window (‘frame-msw.c’). We use ‘src/Makefile[in.in]’ to compile with GCC (it will end up linking with ‘CYGWIN1.DLL’) and link GCC with -lshell32 -lgdi32 -luser32 etc. (see ‘configure.in’).

Cygwin + X11

We call fork()/execve() to run subprocesses (‘process-unix.c’), and XtCreatePopupShell() to create a top-level window (‘frame-x.c’). We use ‘src/Makefile[.in.in]’ to compile with GCC (it will end up linking with ‘CYGWIN1.DLL’) and link GCC with -lXt, -lX11, etc. (see ‘configure.in’).

Finally, if native OS + X11 were possible, it might look something like

[Native OS + X11]

We call CreateProcess() to run subprocesses (‘process-nt.c’), and XtCreatePopupShell() to create a top-level window (‘frame-x.c’). We use ‘nt/xemacs.mak’ to compile with VC++, linking with the Windows CRT (‘MSVCRT.DLL’ or ‘LIBC.LIB’) and with the various X11 DLL’s (‘XT.DLL’, ‘XLIB.DLL’, etc.); or we use ‘src/Makefile[.in.in]’ to compile with GCC, telling it (e.g. -mno-cygwin, see ‘s/mingw32.h’) to use MinGW (which will end up linking with ‘MSVCRT.DLL’), and linking GCC with -lXt, -lX11, etc. (see ‘configure.in’).

One of the reasons that we maintain the ability to build under Cygwin and X11 on Windows, when we have native support, is that it allows Windows compilers to test under a Unix-like environment.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

40.2 Windows Build Flags

CYGWIN

for Cygwin-only stuff.

WIN32_NATIVE

Win32 native OS-level stuff (files, process, etc.). Applies whenever linking against the native C libraries – i.e. all compilations with VC++ and with MINGW, but never Cygwin.

HAVE_X_WINDOWS

for X Windows (regardless of whether under MS Win)

HAVE_MS_WINDOWS

MS Windows native windowing system (anything related to the appearance of the graphical screen). May or may not apply to any of VC++, MINGW, Cygwin.

Finally, there’s also the MINGW build environment, which uses GCC (similar to Cygwin), but native MS Windows libraries rather than a POSIX emulation layer (the Cygwin approach). This environment defines WIN32_NATIVE, but also defines MINGW, which is used mostly because uses its own include files (related to Cygwin), which have a few things messed up.

Formerly, we had a whole host of flags. Here’s the conversion, for porting code from GNU Emacs and such:

Old ConstantNew Constant
—————————————————————-
WINDOWSNTWIN32_NATIVE
WIN32WIN32_NATIVE
_WIN32WIN32_NATIVE
HAVE_WIN32WIN32_NATIVE
DOS_NTWIN32_NATIVE
HAVE_NTGUIWIN32_NATIVE, unless it ends up already bracketed by this
HAVE_FACESalways true
MSDOSdetermine whether this code is really specific to MS-DOS (and not Windows – e.g. DJGPP code); if so, delete the code; otherwise, convert to WIN32_NATIVE (we do not support MS-DOS w/DOS Extender under XEmacs)
__CYGWIN__CYGWIN
__CYGWIN32__CYGWIN
__MINGW32__MINGW

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

40.3 Windows I18N Introduction

Abstract: This page provides an overview of the aspects of the Win32 internationalization API that are relevant to XEmacs, including the basic distinction between multibyte and Unicode encodings. Also included are pointers to how XEmacs should make use of this API.

The Win32 API is quite well-designed in its handling of strings encoded for various character sets. The API is geared around the idea that two different methods of encoding strings should be supported. These methods are called multibyte and Unicode, respectively. The multibyte encoding is compatible with ASCII strings and is a more efficient representation when dealing with strings containing primarily ASCII characters, but it has a great number of serious deficiencies and limitations, including that it is very difficult and error-prone to work with strings in this encoding, and any particular string in a multibyte encoding can only contain characters from a very limited number of character sets. The Unicode encoding rectifies all of these deficiencies, but it is not compatible with ASCII strings (in other words, an existing program will not be able to handle the encoded strings unless it is explicitly modified to do so), and it takes up twice as much memory space as multibyte encodings when encoding a purely ASCII string.

Multibyte encodings use a variable number of bytes (either one or two) to represent characters. ASCII characters are also represented by a single byte with its high bit not set, and non-ASCII characters are represented by one or two bytes, the first of which always has its high bit set. (The second byte, when it exists, may or may not have its high bit set.) There is no single multibyte encoding. Instead, there is generally one encoding per non-ASCII character set. Such an encoding is capable of representing (besides ASCII characters, of course) only characters from one (or possibly two) particular character sets.

Multibyte encoding makes processing of strings very difficult. For example, given a pointer to the beginning of a character within a string, finding the pointer to the beginning of the previous character may require backing up all the way to the beginning of the string, and then moving forward. Also, an operation such as separating out the components of a path by searching for backslashes will fail if it’s implemented in the simplest (but not multibyte-aware) fashion, because it may find what appears to be a backslash, but which is actually the second byte of a two-byte character. Also, the limited number of character sets that any particular multibyte encoding can represent means that loss of data is likely if a string is converted from the XEmacs internal format into a multibyte format.

For these reasons, the C code in XEmacs should never do any sort of work with multibyte encoded strings (or with strings in any external encoding for that matter). Strings should always be maintained in the internal encoding, which is predictable, and converted to an external encoding only at the point where the string moves from the XEmacs C code and enters a system library function. Similarly, when a string is returned from a system library function, it should be immediately converted into the internal coding before any operations are done on it.

Unicode, unlike multibyte encodings, is a fixed-width encoding where every character is represented using 16 bits. It is also capable of encoding all the characters from all the character sets in common use in the world. The predictability and completeness of the Unicode encoding makes it a very good encoding for strings that may contain characters from many character sets mixed up with each other. At the same time, of course, it is incompatible with routines that expect ASCII characters and also incompatible with general string manipulation routines, which will encounter a great number of what would appear to be embedded nulls in the string. It also takes twice as much room to encode strings containing primarily ASCII characters. This is why XEmacs does not use Unicode or similar encoding internally for buffers.

The Win32 API cleverly deals with the issue of 8 bit vs. 16 bit characters by declaring a type called TCHAR which specifies a generic character, either 8 bits or 16 bits. Generally TCHAR is defined to be the same as the simple C type char, unless the preprocessor constant UNICODE is defined, in which case TCHAR is defined to be WCHAR, which is a 16 bit type. Nearly all functions in the Win32 API that take strings are defined to take strings that are actually arrays of TCHARs. There is a type LPTSTR which is defined to be a string of TCHARs and another type LPCTSTR which is a const string of TCHARs. The theory is that any program that uses TCHARs exclusively to represent characters and does not make assumptions about the size of a TCHAR or the way that the characters are encoded should work transparently regardless of whether the UNICODE preprocessor constant is defined, which is to say, regardless of whether 8 bit multibyte or 16 bit Unicode characters are being used. The way that this is actually implemented is that every Win32 API function that takes a string as an argument actually maps to one of two functions which are suffixed with an A (which stands for ANSI, and means multibyte strings) or W (which stands for wide, and means Unicode strings). The mapping is, of course, controlled by the same UNICODE preprocessor constant. Generally all structures containing strings in them actually map to one of two different kinds of structures, with either an A or a W suffix after the structure name.

Unfortunately, not all of the implementations of the Win32 API implement all of the functionality described above. In particular, Windows 95 does not implement very much Unicode functionality. It does implement functions to convert multibyte-encoded strings to and from Unicode strings, and provides Unicode versions of certain low-level functions like ExtTextOut(). In fact, all of the rest of the Unicode versions of API functions are just stubs that return an error. Conversely, all versions of Windows NT completely implement all the Unicode functionality, but some versions (especially versions before Windows NT 4.0) don’t implement much of the multibyte functionality. For this reason, as well as for general code cleanliness, XEmacs needs to be written in such a way that it works with or without the UNICODE preprocessor constant being defined.

Getting XEmacs to run when all strings are Unicode primarily involves removing any assumptions made about the size of characters. Remember what I said earlier about how the point of conversion between internally and externally encoded strings should occur at the point of entry or exit into or out of a library function. With this in mind, an externally encoded string in XEmacs can be treated simply as an arbitrary sequence of bytes of some length which has no particular relationship to the length of the string in the internal encoding.

#### The rest of this is out-of-date and needs to be written to reference the actual coding systems or aliases that we currently use.

[[ To facilitate this, the enum external_data_format, which is declared in ‘lisp.h’, is expanded to contain three new formats, which are FORMAT_LOCALE, FORMAT_UNICODE and FORMAT_TSTR. FORMAT_LOCALE always causes encoding into a multibyte string consistent with the encoding of the current locale. The functions to handle locales are different under Unix and Windows and locales are a process property under Unix and a thread property under Windows, but the concepts are basically the same. FORMAT_UNICODE of course causes encoding into Unicode and FORMAT_TSTR logically maps to either FORMAT_LOCALE or FORMAT_UNICODE depending on the UNICODE preprocessor constant.

Under Unix the behavior of FORMAT_TSTR is undefined and this particular format should not be used. Under Windows however FORMAT_TSTR should be used for pretty much all of the Win32 API calls. The other two formats should only be used in particular APIs that specifically call for a multibyte or Unicode encoded string regardless of the UNICODE preprocessor constant. String constants that are to be passed directly to Win32 API functions, such as the names of window classes, need to be bracketed in their definition with a call to the macro TEXT. This awfully named macro, which comes out of the Win32 API, appropriately makes a string of either regular or wide chars, which is to say this string may be prepended with an L (causing it to be a wide string) depending on the UNICODE preprocessor constant.

By the way, if you’re wondering what happened to FORMAT_OS, I think that this format should go away entirely because it is too vague and should be replaced by more specific formats as they are defined. ]]

Use Qnative for Unix conversion, Qmswindows_tstr for Windows ...

String constants that are to be passed directly to Win32 API functions, such as the names of window classes, need to be bracketed in their definition with a call to the macro XETEXT. This appropriately makes a string of either regular or wide chars, which is to say this string may be prepended with an L (causing it to be a wide string) depending on XEUNICODE_P.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

40.4 Modules for Interfacing with MS Windows

There are two different general Windows-related include files in src.

Uses are approximately:

syswindows.h

Wrapper around ‘<windows.h>’, including missing defines as necessary. Includes stuff needed on both Cygwin and native Windows, regardless of window system chosen. Includes definitions needed for Unicode conversion/encapsulation, and other Mule-related stuff, plus various other prototypes and Windows-specific, but not GUI-specific, stuff.

console-msw.h

Used on both Cygwin and native Windows, but only when native window system (as opposed to X) chosen. Includes ‘syswindows.h’.

Summary of files:

console-msw.h

include file for native windowing (otherwise, ‘console-x.h’, etc.)

console-msw.c, frame-msw.c, etc.

native windowing, as above

process-nt.c

subprocess support for native OS (otherwise, ‘process-unix.c’)

nt.c

support routines used under native OS

win32.c

support routines used under both OS environments

syswindows.h

support header for both environments

nt/xemacs.mak

Makefile for VC++ (otherwise, ‘src/Makefile.in.in’)

s/windowsnt.h

s header for basic native-OS defines, VC++ compiler

s/mingw32.h

s header for basic native-OS defines, GCC/MinGW compiler

s/cygwin.h

s header for basic Cygwin defines

s/win32-native.h

s header for basic native-OS defines, all compilers

s/win32-common.h

s header for defines for both OS environments

intl-win32.c

internationalization functions for both OS environments

intl-encap-win32.c

Unicode encapsulation functions for both OS environments

intl-auto-encap-win32.c

Auto-generated Unicode encapsulation functions

intl-auto-encap-win32.h

Auto-generated Unicode encapsulation headers


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

40.5 CHANGES from 21.4-windows branch (probably obsolete)

This node contains the ‘CHANGES-msw’ log that Andy Piper kept while he was maintaining the Windows branch of 21.4. These changes have (presumably) long since been merged to both 21.4 and 21.5, but let’s not throw the list away yet.

CHANGES-msw

This file briefly describes all mswindows-specific changes to XEmacs in the OXYMORON series of releases. The mswindows release branch contains additional changes on top of the mainline XEmacs release. These changes are deemed necessary for XEmacs to be fully functional under mswindows. It is not intended that these changes cause problems on UNIX systems, but they have not been tested on UNIX platforms. Caveat Emptor.

See the file ‘CHANGES-release’ for a full list of mainline changes.

to XEmacs 21.4.9 "Informed Management (Windows)"

to XEmacs 21.4.8 "Honest Recruiter (Windows)"

to XEmacs 21.4.7 "Economic Science (Windows)"

to XEmacs 21.4.6 "Common Lisp (Windows)"

to XEmacs 21.4.5 "Civil Service (Windows)"


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Aidan Kehoe on December 27, 2016 using texi2html 1.82.