Celebrating the 30-th anniversary of the first C++ compiler: let’s find bugs in it

Cfront is a C++ compiler which came into existence in 1983, and was developed by Bjarne Stroustrup. At that time it was known as “C with Classes”. Cfront had a complete parser, symbol tables, and built a tree for each class, function, etc. Cfront was based on CPre. Cfront defined the language until circa 1990. Many of the obscure corner cases in C++, are related to the Cfront implementation limitations. The reason for this, is that Cfront performed translation from C++ to C. In short, Cfront is a sacred artifact for a C++ programmer. So I just couldn’t help checking such a project.

image1

Introduction

The idea to check Cfront occurred to me after reading an article devoted to the 30-th anniversary of the first Release version of this compiler: “30 YEARS OF C++“. I contacted Bjarne Stroustrup to get the source code of Cfront. For some reason I thought it would be a great hassle getting the code; but it turned out to be quite easy. This source code is open, available for everybody and can be found here: http://www.softwarepreservation.org/projects/c_plus_plus/

image2

I’ve decided to check the first commercial version of Cfront, released in October, 1983 as it’s this version that turned 30 this year.

Bjarne warned me that checking Cfront could be troublesome:

Please remember this is *very* old software designed to run on a 1MB 1MHz machine, and also used on original PCs (640KB). It was also done by one person (me) as only part of my full time job.

Indeed, to check such a project was impossible. At that time, for instance, to separate a class name from a function name they used a simple dot (.) instead of double colon (::). For example:

inline Pptr type.addrof() { return new ptr(PTR,this,0); }

Our PVS-Studio analyzer wasn’t ready for this. So I had to ask our colleague to look through the code, and correct such spots manually. It really helped, although there still were some troubles. When the analyzer was checking some fragments, at times it got quite confused, and was refusing to do the analysis. Nevertheless, I did manage to check the project.

I should say right away, I haven’t found anything crucial. I think there are 3 reasons why PVS-Studio hasn’t found serious bugs:

  1. The project size is small. It’s just 100 KLOC in 143 files.
  2. The code is of high quality.
  3. PVS-Studio analyzer didn’t understand some fragments of the code.

“Talk is cheap. Show me the code” (c) Linus Torvalds

So, enough talking. I guess the readers are here to see at least one error of THE Stroustrup. Let’s have a look at the code.

Fragment 1.

typedef class classdef * Pclass;

#define PERM(p) p->permanent=1

Pexpr expr.typ(Ptable tbl)
{
  ....
  Pclass cl;
  ....
  cl = (Pclass) nn->tp;
  PERM(cl);
  if (cl == 0) error('i',"%k %s'sT missing",CLASS,s);
  ....
}

PVS-Studio warning: V595 The ‘cl’ pointer was utilized before it was verified against nullptr. Check lines: 927, 928. expr.c 927

The ‘cl’ pointer can be equal to NULL. The if (cl == 0) check indicates that. What’s worse is that this pointer gets dereferenced before this check. It occurs in the PERM macro.

So if we open the macro, we get:

cl = (Pclass) nn->tp;
cl->permanent=1
if (cl == 0) error('i',"%k %s'sT missing",CLASS,s);

Fragment 2.

The same here. The pointer was dereferenced, and only then was it checked:

Pname name.normalize(Pbase b, Pblock bl, bit cast)
{
  ....
  Pname n;
  Pname nn;
  TOK stc = b->b_sto;
  bit tpdf = b->b_typedef;
  bit inli = b->b_inline;
  bit virt = b->b_virtual;
  Pfct f;
  Pname nx;
  if (b == 0) error('i',"%d->N.normalize(0)",this);
  ....
}

PVS-Studio warning: V595 The ‘b’ pointer was utilized before it was verified against nullptr. Check lines: 608, 615. norm.c 608

Fragment 3.

int error(int t, loc* lc, char* s ...)
{
  ....
  if (in_error++)
    if (t!='t' || 4<in_error) {
      fprintf(stderr,"\nUPS!, error while handling error\n");
      ext(13);
    }
  else if (t == 't')
    t = 'i';
  ....
}

PVS-Studio warning: V563 It is possible that this ‘else’ branch must apply to the previous ‘if’ statement. error.c 164

I am not sure if there is an error here or not, but the code is formatted incorrectly. ‘Else’ refers to the closest ‘if’. This is why the code doesn’t execute in the way it should. If we format it, we’ll have:

if (in_error++)
  if (t!='t' || 4<in_error) {
    fprintf(stderr,"\nUPS!, error while handling error\n");
    ext(13);
  } else if (t == 't')
    t = 'i';

Fragment 4.

extern
genericerror(int n, char* s)
{
  fprintf(stderr,"%s\n",
          s?s:"error in generic library function",n);
  abort(111);
  return 0;
};

PVS-Studio warning: V576 Incorrect format. A different number of actual arguments is expected while calling ‘fprintf’ function. Expected: 3. Present: 4. generic.c 8

Note the format specifiers: “%s”. The string will be printed, but the ‘n’ variable won’t be used.

Miscellaneous:

Unfortunately (or maybe not) I won’t be able to show you anything else that could look like real errors. The analyzer issued some warnings which could be worth looking at, but they are not really serious. For example, the analyzer didn’t like some global variable names:

extern int Nspy, Nn, Nbt, Nt, Ne, Ns, Nstr, Nc, Nl;

PVS-Studio warning: V707 Giving short names to global variables is considered to be bad practice. It is suggested to rename ‘Nn’ variable. cfront.h 50

Another example: to print pointer values by means of fprintf() function Cfront uses the “%i” specificator. In the modern version of the language we have “%p”. But as far as I understand, there was no “%p” 30 years ago, and the code was totally correct.

Thought-provoking observations

This pointer

My attention was drawn by the fact that previously ‘this’ pointer was used in a different way. A couple of examples:

expr.expr(TOK ba, Pexpr a, Pexpr b)
{
  register Pexpr p;

  if (this) goto ret;
  ....
  this = p;
  ....
}

inline toknode.~toknode()
{
  next = free_toks;
  free_toks = this;
  this = 0;
}

As you see, it wasn’t forbidden to change ‘this’ value. Now it’s not only prohibited to change the pointer, but also to compare ‘this’ to null, as this comparison has completely lost any sense. (Still Comparing “this” Pointer to Null?)

This is the place for paranoia

I’ve also come across an interesting fragment. Nothing seems safe anymore. I liked this code fragment:

/* this is the place for paranoia */
if (this == 0) error('i',"0->Cdef.dcl(%d)",tbl);
if (base != CLASS) error('i',"Cdef.dcl(%d)",base);
if (cname == 0) error('i',"unNdC");
if (cname->tp != this) error('i',"badCdef");
if (tbl == 0) error('i',"Cdef.dcl(%n,0)",cname);
if (tbl->base != TABLE) error('i',"Cdef.dcl(%n,tbl=%d)",
                              cname,tbl->base);

Bjarne Stroustrup’s commentaries

  • Cfront was bootstrapped from Cpre, but it was a complete rewrite. There wasn’t a line of Cpre code in Cfront
  • The use-before-test-of-0 bad is of course bad, but curiously, the machine and OS i mostly used (DEC and research Unix) had page zero write protected, so that bug could not have been triggered without being caught.
  • The if-then-else bug (or not) is odd. I read the source, it’s not just misformatted, it’s incorrect; but curiously, that doesn’t matter: the only difference is a slight difference in the error message used before terminating. No wonder I did not spot it.
  • Yes, I should have used more readable names. I hadn’t counted on having other people maintain this program for years (and I’m a poor typist).
  • Yes, there were no %p then
  • Yes, the rules for “this” changed
  • The paranoia test is in the compiler’s main loop. My thought was that if anything when wrong with the software or hardware, one of those tests were likely to fail. At least once, it caught the effect of a bug in the code generator used to build Cfront. I think all significant programs should have a “paranoia test” against “impossible” errors.

Conclusion:

It’s really hard to estimate the significance of Cfront. It influenced the development of a whole sphere of programming, and gave this world an everlasting C++ language which continues developing. I am really grateful to Bjarne for all the work he has done in creating and developing C++. Thank you. In my turn, I was really glad to dig into the code of this wonderful compiler.

I thank all our readers for their attention, and wish you to have less bugs.

By  Andrey Karpov, Bjarne Stroustrup

One thought on “Celebrating the 30-th anniversary of the first C++ compiler: let’s find bugs in it

  1. Release E survived only as a 420+ page printout (which is also on the historical archive site you linked to) … until we transcribed it in 2016. To fully verify the transcription, however, would require compiling the program and then running it on demos (like itself!) to ensure it produces consistent results. To that end, Release 1 – which is what you describe – could be used to compile Release E, since it also has the bootstrap C code.

    The problem with the syntax T.X in place of T::X is a non-issue. It’s just a simple matter of doing a grep for all instances of identifier-dot-identifier into a file, run an editor script to filter these, run a duplicate removal on the resulting file, and then convert the file to an editor script that makes the needed change to the program. A few minutes by an experienced hand. Mere seconds even, if you type as fast as Lola Astanova plays the piano.

    The real issue, if going with the archaic-C++ version, is what you described: the inconsistent handling of the “this” pointer. The simplest way to characterize this problem is to just say that the people who created and used the early versions of C++ weren’t yet experienced C++ programmers! (Because nobody was.) So they didn’t have enough experience and understanding of the still-future-C++, yet, to make proper use of the “this” pointer. A resolution and fix of the issue might be found by cross-checking it against Release 3 and using that as a guide to fix what’s in Release 1. The library of malloc() and free() in the source of Release 1, along with the constructors and destructors for the class objects, were mostly encapsulated in the memory-management file alloc.c, by the time we get to Release 3.

    In fact, though, it’s actually easier to take the bootstrap C code of Release 1 and work with that, instead. In that case, however, special measures must be taken to reinsert some of the macros (namely for getc(), putc(), isalpha(), isdigit()), to pun away the dependencies on , (and the implied dependencies on and ), to reinsert explicit name/type references of the original archaic-C++ code in all the sizeof’s that folded into numbers in the C code, to more carefully handle or replace the 0-initializations that take place in the constructors (for stmt, expr, name), perhaps using memset(). Macro-reinsertion can be done efficiently with the aid of editor scripts (and grep) so that’s a non-issue as well. But it does require getting rid of the #line directives in the bootstrap code (because they badly chop up the source lines) and re-layout-ing the program (e.g. using “indent” and more editor scripts). All of this is fairly straightforward and – with the changes – it will compile as an ordinary C program under GCC.

    At the time of writing, however, validation still remains an unresolved question. The simplest test is to reproduce the original bootstrap code for Release 1 from the C version of Release 1 and then to run it on Release E to fix any problems in the transcription, compile the resulting C code and run the C version of Release E to ensure it reproduces what the Release 1 version of the C code produced on it.

    This will give you working versions of the early release – albeit with the NULL-checking bugs you mentioned still in place. But “accurate reproduction” means “reproducing it, flaws and all”, so that’s fine. In any case, these utilities are of historical interest only and they were superseded by Release 3, which repaired most or all the problems you described.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.