[Gs-devel] Re: compilers as the warnings yardstick

Nelson H. F. Beebe beebe@math.utah.edu
Wed, 17 Oct 2001 06:56:28 -0600 (MDT)


The Gs-devel digest, Vol 1 #232 posted on Tue, 16 Oct 2001 22:03:06 -0700
carried this note from L. Peter Deutsch:

>> ...
>>         * Since different compilers care about different things, pick 2-3
>>           compilers as the warnings yardstick.  (Proposed: gcc, MSVC, maybe
>>           Code Warrior or Intel P4 or IA-64 compiler.)  Use the automated
>>           regression test to detect the appearance of new unsanctioned
>>           warnings (by diff'ing the warning messages stripped of their line
>>           numbers), and treat these as real problems.
>> ...

This response is lengthy (about 410 lines), so I've separated the
major sections with a horizontal line of dashes.

------------------------------------------------------------------------

gcc is too forgiving to be the major yardstick.  Also, it tends to
give the same diagnostics on each platform, so it is not a good test
of portability.  And, because it offers C language extensions (and
recognizes C++ style // comments), sloppy programmers often use those
extensions without realizing they have compromised portability by
doing so.

gcc is the most portable C/C++ compiler in the world today, but it
certainly does not produce the fastest code: vendor native compilers
can sometimes produce a 2x, or better, speedup.  On recent extensive
compiler optimization tests of gzip, which is a bottleneck for our
nightly file system backups, consuming many hours of CPU time nightly,
I got a 1.6x speedup from native compilers on three platforms.

I now routinely build code on 17 different UNIX platforms, using both
gcc and g++ (2.95.x and 3.0.x), and native C and C++ compilers (Sun
Solaris 2.[78], SGI IRIX 6.5, Compaq/DEC OSF/1 [45].0, IBM AIX
4.[23]).  I also use the Portland Group compilers, pgcc, pgCC, pgf77,
pgf90, and pghpf on GNU/Linux on Intel x86 (see http://www.pgroup.com).

I often find that the native compilers pick up problems that gcc
misses.  SGI's compilers (C, C++, Fortran) are particularly
outstanding, and also tend to compile extremely fast.  Just this
morning, for another package that I'm beta testing, I sent its author
a long list of warnings that Sun's C++ compiler generated for problems
that not a single other C or C++ compiler complained about.

I have for several years made a point in my own code development of
writing to that subset of 1989 Standard C that is also conformant with
the emerging (and 1998 final) Standard C++, and I routinely use C++
compilers in development testing of my C code.

C++ compilers, because of their much stricter type checking, are very
good at catching problems that would ultimately cause failure on some
platform.

It takes a bit of discipline to keep C code compiling under C++.  New
C++ reserved words

	catch		  false		    new		      try
	class		  friend	    operator	      typeid
	const_cast	  inline	    overload	      typeof
	delete		  mutable	    private	      using
	dynamic_cast	  namespace	    protected	      virtual

must be avoided in C code.  "const char*" becomes a very important
type.  Functions returning void* must be explicitly cast. ...  There
is a very good description of the areas where C and C++ conflict at

	http://home.flash.net/~dtribble/text/cdiffs.htm#intro

------------------------------------------------------------------------

The second recommendation that I would give is that some of the main
UNIX vendor C/C++ compilers should be part of the yardstick.  In order
of decreasing pickiness, these are SGI, Sun, Compaq/DEC, [HP], and
IBM.  I bracketed HP there, because at present, I now only have access
to one HP-UX system, and that one only has the default old K&R
compiler; c89 is available, but costs extra.  Our own HP systems were
retired several months ago, but I routinely used c89 and CC (C++) on
them.

------------------------------------------------------------------------

My third recomendation is frequent application of lclint; I've
recently been testing the 3.0.0.17 release

	http://lclint.cs.virginia.edu/lclint-3.0.0.17.src.tgz

and have so far sent 41 messages to the developers about build
problems, and errors that I detected (code that was warned about, but
should not have been, and the reverse), and always received prompt
responses.  The next release should fix all of the problems I found.
I have succeeded in building lclint-3.0.0.17 on all 17 of my UNIX
development platforms.

lclint is a truly remarkable program, but like all lint programs, it
can drown the user in warnings that obscure the real problems.  It
took me several days of experimentation to find a suitable set of
default options that reduce the warnings to just the ones that I think
are important, and as a result, I found quite a few errors in one of
my packages that has been widely ported and tested, and in heavy local
use for 15 years.  For reference, here is what my Makefile records:

LCLINT		= /usr/local/bin/lclint

### lclint (available at http://lclint.cs.virginia.edu/) is a powerful
### extended lint implementation with a great many options.  The trick
### is to find a set that produce useful output, without hiding
### important complaints in a mass of less important messages.  The set
### below seems to work reasonably for the XXX package on Sun
### Solaris 2.7, and is probably usable on other systems as well.  Any
### of these can options be turned back on at make time by supplying
### them with a plus prefix, e.g.,
###
### 	make lclint XLCLINTFLAGS='+varuse'
###
### NB: For use with other programs, these options could be installed in
### $HOME/.lclintrc, one per line (removing the backslashes), and
### optionally commented from sharp (#) to end of line.  The -nof option
### below suppresses the loading of user customization files, so that we
### do not provide any other options than the ones shown here.

LCLINTDEFS	= -DOPEN_MAX=20 \
		  -D__STDC__ \
		  -D__sparc \
		  -D'fileno(f)=0' \
		  $(XLCLINTDEFS)

LCLINTFLAGS	= $(LCLINTDEFS) \
		  -boolops \
		  -booltype BOOLEAN \
		  -branchstate \
		  -compdef \
		  -compmempass \
		  -dependenttrans \
		  -exitarg \
		  -exportlocal \
		  -fileextensions \
		  -fixedformalarray \
		  -formatconst \
		  -globstate \
		  -ifempty \
		  -immediatetrans \
		  -incondefs \
		  -macrovarprefixexclude \
		  -mayaliasunique \
		  -mustfree \
		  -mutrep \
		  -nof \
		  -nullassign \
		  -nullpass \
		  -nullret \
		  -nullstate \
		  -observertrans \
		  -onlytrans \
		  -paramuse \
		  -predboolint \
		  -predboolothers \
		  -realcompare \
		  -shiftsigned \
		  -statictrans \
		  -sysunrecog \
		  -temptrans \
		  -type \
		  -unqualifiedtrans \
		  -unreachable \
		  -unsignedcompare \
		  -usedef \
		  -varuse \
		  -warnposix \
		  +voidabstract \
		  $(LCLINTOTHERFLAGS) \
		  $(XLCLINTFLAGS)

### These additional lclint flags may be only temporarily desirable:
LCLINTOTHERFLAGS = \
		  -formattype \
		  -redef

LCLINTSRCS	= $(LIBSRCS) $(EXTRALIBSRCS) $(OTHERSRCS)

### lclint gives lots of duplicate-definition warnings if it is given
### all of the source files in one pass, so we normally run it once per
### file.  However, to permit cross-module checking, we also provide a
### target to do it in one pass.
lclint:
	@-for f in $(LCLINTSRCS) ; \
	do \
		echo ==================== $$f ; \
		$(LCLINT) $(LCLINTFLAGS) $$f ; \
	done

lclint-one-pass:
	$(LCLINT) $(LCLINTFLAGS) $(LCLINTSRCS)


Here are the corresponding settings that I use for lint; the Sun
Solaris lint has particularly good diagnostics, but takes rather
different options, so they are recorded in the Makefile, but are not
the default ones.

LINT		= /opt/SUNWspro/bin/lint

LINTDEFS	= -Dinline= $(XLINTDEFS)

# These values are for Sun Solaris 2.6 lint
LINTFLAGS       = -errchk=%all -errhdr=. -errtags -fd -I. -Ncheck=%all \
                  -Nlevel=4 -p -Xtransition=yes -D__cplusplus=1 $(XLINTFLAGS)

LINTFLAGS       = -errchk=%all -errhdr=. -errtags -fd -I. -Ncheck=%all \
                  -Nlevel=4 -p -Xtransition=yes $(XLINTFLAGS)

LINTFLAGS	= -bchx -I. $(LINTDEFS) $(XLINTFLAGS)

LINTSRCS	= $(LIBSRCS) $(EXTRALIBSRCS) $(OTHERSRCS)



### Although it usually is not necessary to run lint once per file,
### for consistency with the lclint and lclint-one-pass targets, we
### do the same for lint and lint-one-pass:

lint:
	@-for f in $(LINTSRCS) ; \
	do \
		echo ==================== $$f ; \
		$(LINT) $(LINTFLAGS) $$f ; \
	done

lint-one-pass:
	$(LINT) $(LINTFLAGS) $(LINTSRCS)


------------------------------------------------------------------------

My fourth recommendation concerns code robustness, notably, avoidance
of unsafe Standard C string routines.  In the gs-7.02 tree, I found
these usage counts in the *.c files:

	sprintf		644
	strcat		55
	strcpy		90
	strncat		1
	strncpy		25

In the package I'm currently working on, I've replaced all uses of the
unsafe str{cat,cpy,ncat,ncpy} functions by the OpenBSD ones, described in
this paper:

@InProceedings{Miller:1999:SSC,
  author =       "Todd C. Miller and Theo de Raadt",
  title =        "{\tt strlcpy} and {\tt strlcat} --- consistent, safe,
                 string copy and concatenation",
  crossref =     "USENIX:1999:UAT",
  pages =        "??--??",
  year =         "1999",
  bibdate =      "Thu Feb 24 11:35:57 2000",
  URL =          "http://www.openbsd.org/papers/strlcpy-paper.ps",
  acknowledgement = ack-nhfb,
}

@String{pub-USENIX              = "USENIX Association"}

@String{pub-USENIX:adr          = "Berkeley, CA, USA"}

@Proceedings{USENIX:1999:UAT,
  editor =       "{USENIX}",
  booktitle =    "Usenix Annual Technical Conference. June 6--11, 1999.
                 Monterey, California, USA",
  title =        "Usenix Annual Technical Conference. June 6--11, 1999.
                 Monterey, California, {USA}",
  publisher =    pub-USENIX,
  address =      pub-USENIX:adr,
  pages =        "????",
  year =         "1999",
  ISBN =         "",
  LCCN =         "",
  bibdate =      "Thu Feb 24 11:34:22 2000",
  acknowledgement = ack-nhfb,
}

The paper is available at the indicated URL, so you need not run to
the library to hunt it down.

The OpenBSD functions take a third argument: a size_t value giving the
length of the target buffer (the first argument).  Their return value
is the number of characters that would have been copied if the target
were sufficiently large: when that return value is >= the third
argument, then there was insufficient space in the target buffer.

The OpenBSD code described in the above paper is freely distributable,
and is available at:

	ftp://ftp.openbsd.org/pub/OpenBSD/src/lib/libc/string/strlcat.c
	ftp://ftp.openbsd.org/pub/OpenBSD/src/lib/libc/string/strlcpy.3
	ftp://ftp.openbsd.org/pub/OpenBSD/src/lib/libc/string/strlcpy.c

The safe alternative to sprintf is snprintf.  It is available on every
one of my 17 UNIX platforms, and has been in Microsoft C since version
7.0 in October 1992. It is also part of 1999 Standard C (ISO/IEC
9899:1999(E) Programming languages -- C (aka C99)).

In gs-7.02, I find this use count for the safe alternative to sprintf():

	snprintf	1

That single instance is in ./src/gdevhpij.c.

I too have a large number of uses of sprintf() in my own code.  I have
not yet committed to using snprintf(), because of possible portability
problems to systems that I don't have access to for testing.  It does
seem to be widely available, however, and its inclusion in C99
guarantees future availability.  I may want to implement a portable
private version myself before converting my code to use it.

Based on my experience going back to the mid 1960s, historically it
has taken 6 to 8 years after an ANSI/ISO language standard is
published for all vendors to reach conformance, so I cannot yet rely
on C99 features.  Indeed, it is now almost 2 years since the 1999 ISO
C standard appeared, and 3 since the 1998 C++ ISO Standard, yet on my
17 platforms, with more than 50 C compilers, and 20 C++ compilers, NOT
ONE of them conforms to these newer ISO Standards.

------------------------------------------------------------------------

I use GNU autoconf, and recommend it highly.  The work in
autoconfiguring a package gets easier for me all the time, because I
can borrow tests from previous packages.  The first package that I
adapted for autoconf took me a week.  Now, the work is usually just a
few hours. The payoff is that the GNU standard incantation

	[env CC=your-favorite] ./configure  && make all check install

works flawlessly everywhere, even on Uwin or Cygwin on Microsoft
Windows.  

When you build code on as many platforms as I do, it HAS to be this
easy: I just type "gnu-build-all.sh package-x.y.z", and the build
script finds package-x.y.z.{tar.gz,.tar,.tgz} somewhere in an
extensive search path, distributes it to all the target platforms that
don't have it available already via an NSF mount, creates a
backgrounded ssh connection to each platform, unpacks the distribution
in a standard build location, and runs the builds and checks. This is
done in parallel, logging the output in
/var/tmp/package-x.y.z.`hostname`.log for later examination from my
login platform.  Once I'm satisfied with the builds and checks, I then
manually visit each platform and do "make install".

------------------------------------------------------------------------

My own code NEVER includes any standard header file directly; instead,
it uses a private interface header file that can contain fixups for
code errors in system header files.  Thus, I write

#include "xstdio.h"
#include "xstring.h"
...

For the str*() function, I went a step further, and wrapped the
OpenBSD functions like this:

extern int xstrmsg ARGS((const char *file_, int line_, size_t len_));
/*@unused@*/ static size_t xstrlen___;

#define strlcat(dest_,src_,n_) \
	(((xstrlen___ = (strlcat)(dest_,src_,n_)) >= (n_)) ? \
			xstrmsg(__FILE__,__LINE__,xstrlen___) : xstrlen___)

#define strlcpy(dest_,src_,n_) \
	(((xstrlen___ = (strlcpy)(dest_,src_,n_)) >= (n_)) ? \
			xstrmsg(__FILE__,__LINE__,xstrlen___) : xstrlen___)

Virtually all of my calls to these functions are cast to (void), so
the return value is discarded. The wrapping ensures that a check will
actually be made, and in the event of a too-small target buffer,
xstrmsg() is called to log the error.  This turned out to be useful,
since in the 153 calls to those two functions in the package that I
recently converted, I'd made mistakes in two places during the
conversion from the old Standard C functions, and those mistakes were
caught by my wrapper code.

To prevent the Standard C functions ever creeping back into my code,
my interface to <string.h>, "xstring.h", does this:

#undef strcat
#undef strcpy
#undef strncat
#undef strncpy
#define strcat(dest_,src_)	__ERROR__you_MUST_replace_strcat_by_strlcat__(dest_,src_)
#define strcpy(dest_,src_)	__ERROR__you_MUST_replace_strcpy_by_strlcpy__(dest_,src_)
#define strncat(dest_,src_,n_)	__ERROR__you_MUST_replace_strncat_by_strlcat__(dest_,src_,n_)
#define strncpy(dest_,src_,n_)	__ERROR__you_MUST_replace_strncpy_by_strlcpy__(dest_,src_,n_)

This idea is shamelessly borrowed from the Samba sources, which do
something similar.

The requirement for the target buffer length as the third argument to
the OpenBSD functions exposed a few places in my code where it was not
available, because the target was passed into a function without an
associated length.  I repaired all such instances by adding a target
length argument to those deficient functions.
------------------------------------------------------------------------

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- Center for Scientific Computing       FAX: +1 801 585 1640, +1 801 581 4148 -
- University of Utah                    Internet e-mail: beebe@math.utah.edu  -
- Department of Mathematics, 322 INSCC      beebe@acm.org  beebe@computer.org -
- 155 S 1400 E RM 233                       beebe@ieee.org                    -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe  -
-------------------------------------------------------------------------------