Fun and Bugs in Microsoft Word 1.1a

The Microsoft company made a present to all programmers eager to dig into some interesting stuff: they opened the source codes of MS-DOS v 1.1, v 2.0 and Word for Windows 1.1a. The MS-DOS operating system is written in an assembler, so the analyzer cannot be applied to it. But Word is written in C. Word 1.1a’s source codes are almost 25 years old, but we still managed to analyze it. There’s no practical use of it, of course. Just for fun.


Where to find the source files

One can download the source codes of MS-DOS v 1.1, v 2.0 and Word for Windows 1.1a. Those interested in digging the source files on their own should check the original source.

The press release: Computer History Museum Makes Historic MS-DOS and Word for Windows Source Code Available to the Public.

Checking Word 1.1a


Figure 1. Word for Windows 1.1a.

Word for Windows 1.1a was released in 1990. Its source code was made publicly available on March 25, 2014. Word has always been a flagship product of Microsoft, and I, as well as many other programmers, was very eager to peek into the inside of the software product that so much contributed to Microsoft’s commercial success.

I decided to check Word 1.1a’s code with our tool PVS-Studio. It is a static analyzer for C/C++/C# code. That task was not that easy, of course, as the analyzer is designed to work with projects developed at least in Visual Studio 2005. And now I had C source codes that are more than 20 years old. We can fairly call them a finding from the prehistoric times. At least, the C language standard didn’t exist then and every compiler had to be by itself. Fortunately, Word 1.1a’s source codes appeared to be free of any specific nuances and non-standard compiler extensions.

Before you can perform code analysis, you need to get preprocessed files (*.i). Once you have generated them, you can use the PVS-Studio Standalone tool to run the analysis and examine the diagnostic messages. Of course, the analyzer is not designed to check 16-bit programs, but the results I got were quite enough to satisfy my curiosity. After all, a meticulous analysis of a 24 year old project just wouldn’t make any sense.

Therefore, the basic obstacle was in obtaining the preprocessed files for the source codes. I asked my colleague to find some solution, and he approached the task with much creativity: he chose to use GCC 4.8.1 to get the preprocessed files. I guess no one has ever mocked at Word 1.1’s source codes in such a cruel way. How could it have occurred to him at all to use GCC?

What’s most interesting, it all turned out pretty fine. He wrote a small utility to run preprocessing by GCC 4.8.1 of each file from the folder it was stored in. As it displayed error messages regarding troubles with locating and including header files, we added -I switches into the launch parameters to specify the paths to the required files. A couple of header files which we failed to find were created empty. All the other troubles with #include expanding were related to including resources, so we commented them out. The WIN macro was defined for preprocessing as the code contained branches both for WIN and MAC.

After that, PVS-Studio Standalone and I came into play. I noted down a few suspicious code fragments I want to show you. But let’s first speak a bit more about the project itself.

A few words about Word 1.1a’s code

The most complex functions

The following functions showed the highest cyclomatic complexity:

  1. CursUpDown – 219;
  2. FIdle – 192;
  3. CmdDrCurs1 – 142.

#ifdef WIN23

While looking through the source code, I came across “#ifdef WIN23” and couldn’t help smiling, so I noted that fragment down. I thought it was a typo and the correct code was #ifdef WIN32.

When I saw WIN23 for the second time, I grew somewhat doubtful. Just then it struck me that I was viewing source files as old as 24 years by the moment. WIN23 stood for Windows 2.3.

Stern times

In some code fragment, I stumbled upon the following interesting line.

Assert((1 > 0) == 1);

It seems incredible that this condition can ever be false. But since there is such a check, there must be the reason for it. There was no language standard at that time. As far as I get it, it was a good style to check that the compiler’s work met programmers’ expectations.

Well, if we agree to treat K&R as a standard, the ((1 > 0) == 1) condition is always true, of course. But K&R was just a de facto standard. So it’s just a check of the compiler’s adequacy.

Analysis results

Now let’s discuss the suspicious fragments I have found in the code. I guess it’s the main reason why you are reading this article. So here we go.

Infinite loop

void GetNameElk(elk, stOut)
ELK elk;
unsigned char *stOut;
  unsigned char *stElk = &rgchElkNames[mpelkichName[elk]];
  unsigned cch = stElk[0] + 1;

  while (--cch >= 0)
    *stOut++ = *stElk++;

PVS-Studio’s diagnostic message: V547 Expression ‘– cch >= 0’ is always true. Unsigned type value is always >= 0. mergeelx.c 1188

The “while (–cch >= 0)” loop will never terminate. The ‘cch’ variable is unsigned, which means that it will always be >= 0, however long you may decrease it.

A typo leading to an array overrun

uns rgwSpare0 [5];

  printUns ("rgwSpare0[0]   = ", Fib.rgwSpare0[5], 0, 0, fTrue);
  printUns ("rgwSpare0[1]   = ", Fib.rgwSpare0[1], 1, 1, fTrue);
  printUns ("rgwSpare0[2]   = ", Fib.rgwSpare0[2], 0, 0, fTrue);
  printUns ("rgwSpare0[3]   = ", Fib.rgwSpare0[3], 1, 1, fTrue);
  printUns ("rgwSpare0[4]   = ", Fib.rgwSpare0[4], 2, 2, fTrue);

PVS-Studio’s diagnostic message: V557 Array overrun is possible. The ‘5’ index is pointing beyond array bound. dnatfile.c 444

It turned out that the first line for some reason contains the text Fib.rgwSpare0[5]. That’s incorrect: there are just 5 items in the array, therefore the largest index should be 4. The value ‘5’ is just a typo. A zero index should have most likely been used in the first string:

printUns ("rgwSpare0[0]   = ", Fib.rgwSpare0[0], 0, 0, fTrue);

Uninitialized variable

FPrintSummaryInfo(doc, cpFirst, cpLim)
int doc;
CP cpFirst, cpLim;
  int fRet = fFalse;
  int pgnFirst = vpgnFirst;
  int pgnLast = vpgnLast;
  int sectFirst = vsectFirst;
  int sectLast = sectLast;

PVS-Studio’s diagnostic message: V573 Uninitialized variable ‘sectLast’ was used. The variable was used to initialize itself. print2.c 599

The ‘sectLast’ variable is assigned to itself:

int sectLast = sectLast;

I suspect that it should have been initialized to the ‘vsectLast’ variable instead:

int sectLast = vsectLast;

I found one more error of that kind – must be a consequence of using the Copy-Paste method:

V573 Uninitialized variable ‘sectLast’ was used. The variable was used to initialize itself. print2.c 719

Undefined behavior

  static int  iBitmap = 0;
  iBitmap = ++iBitmap % MAXBITMAP;

PVS-Studio’s diagnostic message: V567 Undefined behavior. The ‘iBitmap’ variable is modified while being used twice between sequence points. ddedit.c 107

I don’t know how people used to treat such code 20 years ago, but in our times it is treated as hooliganism as it leads to undefined behavior.

Other fragments with similar issues:

  • V567 Undefined behavior. The ‘iIcon’ variable is modified while being used twice between sequence points. ddedit.c 132
  • V567 Undefined behavior. The ‘iCursor’ variable is modified while being used twice between sequence points. ddedit.c 150

Unsuccessful call of the printf() function

  int     cb;
  int     err;
  printf("\n - %d strings were read, "
         "%d were expected (decimal numbers) -\n");

PVS-Studio’s diagnostic message: V576 Incorrect format. A different number of actual arguments is expected while calling ‘printf’ function. Expected: 3. Present: 1. dini.c 498

The printf() function is a variadic function. Passing or not passing arguments to it are both legal. In this case, the programmer forgot about the arguments, and it resulted in printing garbage all the time.

Uninitialized pointers

One of the auxiliary utilities included into the package of Word source files contains a very strange piece of code.

main(argc, argv)
int argc;
char * argv [];
  FILE * pfl;
  for (argi = 1; argi < argc; ++argi)
    if (FWild(argv[argi]))
      FEnumWild(argv[argi], FEWild, 0);
      FEWild(argv[argi], 0);


PVS-Studio’s diagnostic message: V614 Uninitialized pointer ‘pfl’ used. Consider checking the first actual argument of the ‘fclose’ function. eldes.c 87

The ‘pfl’ variable is initialized neither before the loop nor inside it, while the fclose(pfl) function is called multiple times. It all, however, may have worked pretty well. The function would return an error status and the program would go on running.

And here’s another dangerous function which will most likely cause a program crash.

FPathSpawn( rgsz )
char *rgsz[];
{ /* puts the correct path at the beginning of rgsz[0]
     and calls FSpawnRgsz */
  char *rgsz0;

  strcpy(rgsz0, szToolsDir);
  strcat(rgsz0, "\\");
  strcat(rgsz0, rgsz[0]);
  return FSpawnRgsz(rgsz0, rgsz);

PVS-Studio’s diagnostic message: V614 Uninitialized pointer ‘rgsz0’ used. Consider checking the first actual argument of the ‘strcpy’ function. makeopus.c 961

The ‘ rgsz0’ pointer is not initialized to anything. However, it doesn’t prevent copying a string into it.

Typo in a condition

#define wkHdr    0x4000
#define wkFtn    0x2000
#define wkAtn    0x0008
#define wkSDoc    (wkAtn+wkFtn+wkHdr)

CMD CmdGoto (pcmb)
CMB * pcmb;
  int wk = PwwdWw(wwCur)->wk;
    if (wk | wkSDoc)
      NewCurWw((*hmwdCur)->wwUpper, fTrue);

PVS-Studio’s diagnostic message: V617 Consider inspecting the condition. The ‘(0x0008 + 0x2000 + 0x4000)’ argument of the ‘|’ bitwise operation contains a non-zero value. dlgmisc.c 409

The (wk | wkSDoc) condition is always true. The programmer must have actually meant to write the following code instead:

if (wk & wkSDoc)

That is, the | and & operators are swapped by mistake.

And finally a lengthy but simple sample

int TmcCharacterLooks(pcmb)
CMB * pcmb;
  if (qps wCharQpsSpacing = -qps;
    pcab->iCharIS = 2;
  else  if (qps > 0)
    pcab->iCharIS = 1;
    pcab->iCharIS = 0;
  if (hps wCharHpsPos = -hps;
    pcab->iCharPos = 2;
  else  if (hps > 0)
    pcab->iCharPos = 1;
    pcab->iCharPos = 1;

PVS-Studio’s diagnostic message: V523 The ‘then’ statement is equivalent to the ‘else’ statement. dlglook1.c 873

When working with the ‘qps’ variable, the following values are written into ‘pcab->iCharIS’: 2, 1, 0.

The ‘hps’ variable is handled in a similar way, but in this case, some suspicious values are saved into the variable ‘pcab->iCharPos’: 2, 1, 1.

It must be a typo: a zero was most likely meant to be used at the very end.



I have found very few strange fragments. There are two reasons for that. Firstly, I found the code to be skillfully and clearly written. Secondly, the analysis had to be incomplete, while teaching the analyzer the specifics of the old C language wouldn’t be of any use.

By Andrey Karpov

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s