DECUS C LANGUAGE SYSTEM DECUS C Compiler Reference Manual by David G. Conroy Edited by Martin Minow and John D. Morton This document describes the CC compiler itself (including imple- mentational quirks and known bugs), along with procedures for compiling and executing programs under a wide variety of Digital operating systems. DECUS Structured Languages SIG Version of 1-Aug-80 Copyright (C) 1980, DECUS General permission to copy or modify, but not for profit, is hereby granted, provided that the above copyright notice is included and reference made to the fact that reproduction privileges were granted by DECUS. The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation or by DECUS. Neither Digital Equipment Corporation, DECUS, nor the authors assume any responsibility for the use or reliability of this document or the described software. This software is made available without any support whatsoever. The person responsible for an implementation of this system should expect to have to understand and modify the source code if any problems are encountered in implementing or maintaining the compiler or its run-time library. The DECUS `Structured Languages Special Interest Group' is the primary focus for communication among users of this software. UNIX is a trademark of Bell Telephone Laboratories. RSX, RSTS/E, RT11 and VMS are trademarks of Digital Equipment Corporation. CC Reference Manual Page 3 Usage 1.0 Introduction CC is a multipass C compiler for the PDP-11 that runs under the RSX-11, VMS (compatibility mode), RSTS/E, and/or RT11 operating systems. Except for the restrictions noted in a later section, it compiles programs as per the description of C in the Unix Seventh Edition memo or the book `The C Programming Language' by Brian Kernighan and Dennis Ritchie, (Englewood Cliffs, NJ: Prentice-Hall, 1978). 2.0 Usage Since the C compiler runs on so many operating systems, command information is presented in individual sections for the various operating system families, followed by a common section describing usage and the switches needed to control compilation. 2.1 VMS, RSX-11, and RSTS/E RSX emulation mode After the appropriate setup sequence (described in a later section) has been executed, the compiler may be invoked as follows: XCC [-switches] file or RUN C:XCC CC> [type command line here] The specified file is compiled and the resulting assembly code is placed in a file having the same name as the source file but with a filetype of `S'. The default filetype for source files is `C'. The file will be written to the user's current default account. On RSTS, this is the account under which the user is logged in. Diagnostics are written to the standard output. The diagnostic stream may be redirected by means of the `>' or `>>' conventions: `>filename' writes diagnostics to the named file, while `>>filename' appends diagnostics to the named file. This is compatible with Unix usage. Only a single file may be compiled at one time. Wildcards are not legal in names. The resulting assembly language is assembled with AS as follows: XAS -d file CC Reference Manual Page 4 Usage The generated code should never have any assembly errors. The `-d' switch deletes the input file (`file.s') unless an error is detected. Note that it is not possible to RUN XAS. Object files are compiled into executable images by using the RSX11-M task builder. The simplest command sequence possible after invoking the task builder would be: TKB> prog,map=objects,C:C/LB TKB> // If a program uses large amounts of automatically-allocated storage, the `STACK = number' option should be specified to the task builder. 2.1.1 RT11 or RSTS/E RT11 emulation mode - After the setup sequence described in a later section has been executed, the compiler may be run as follows: CC file [/switches] or RUN C:CC CC> file [/switches] or CC file.s,file.tm1,file.tmp=file.c/switches The latter case explicitly creates and saves the intermediate code (.tm1) and expanded source (.tmp) files. Normally, these are needed only when debugging the compiler. Note that if you do not specify extensions for the intermediate files, they will be given the default of `.tmp' for the expanded source and `.tm1' for the intermediate code file. The resulting assembly language is assembled with AS as follows: RSTS/E AS file/d RT11 RUN AS AS> file/d The generated code should never have any assembly errors. The `/d' switch deletes the input file (`file.s') unless an error is detected. Object modules are compiled into executable images by using the RT11 linker: LINK save,map=objects,c:suport,c:clib/b:2000 The two library files contain the actual main program (in CC Reference Manual Page 5 Usage SUPORT) and the RT11 run-time support library. The start address must be at least 2000 to allow for dynamic storage by subroutines. If the `/b' switch is omitted, executing printf() may cause the program to abort with an `M-trap to 4' message. 2.1.2 Compilation notes - MACRO-11 may NOT be used to assemble the output of CC. CC expects that its assembler can perform certain optimisations (most notably branch adjustment) not performed by MACRO-11. The title of the object file will be set to the first six characters of the source file name. This is of interest only to people who load overlaid programs off libraries. The compiler writes on files `file.TMP' and `file.TM1'. It is, therefore, unwise to keep important things in files with these filetypes. The `.TMP' file contains the C source with #include and #define statements processed. This is the input to the compiler proper. The `.TM1' file contains the intermediate code generated by the compiler parser. This is the input to the code generator. 2.2 Switches Under RSX modes, switches are given as single letters preceeded by a minus sign: XCC -v -s test Under RSTS/E or RT11, switches are given as single letters preceeded by a slash: XCC test/v/s Case is not significant. The following switches are defined: d This argument causes the compiler to execute a breakpoint trap when entering each overlay segment. It is used only for debugging the compiler. e This optional argument causes in-line code to be generated for multiply, divide, xor, and shift operations. NOTE: the current compiler recognises this switch, but does not generate in-line code. f This optional argument causes in-line code to be generated for floating-point operations. NOTE: the current compiler recognises this switch, but does not CC Reference Manual Page 6 Usage support floating-point. Any attempt to compile floating-point operations will result in a fatal compilation error. i This optional argument causes the compiler to retain the intermediate file (phase 1 to phase 2). This file is normally deleted. This option is for compiler maintanence. l This optional argument causes internal code trees to be written (as comments) to the .S output file. This option is for compiler maintanence m This optional argument causes timings of each pass to be printed. This option is only operative on RSX11-modes. It requires hardware EIS. p This optional argument causes profiling code to be compiled (see the section on profiling). s This optional argument causes the compiler to retain the expanded source file (phase 0 to phase 1). This file is normally deleted. This option is for compiler maintanence. v This optional argument causes the compiler to echo the current line of the source onto the error stream whenever an error is detected. In most cases, this is not the line containing the error, because the parser usually has to read the next symbol of the source to determine that an error exists. It will usually be within 1 line, which should be close enough to locate the error. 2.3 Setup of the compiler Before using the C compiler, it must be made known to the operating system. This differs slightly for the various systems. 2.3.1 Setup under VMS - The following setup (or something much like it) should be added to your LOGIN.COM file: $ ASSIGN DBA0:[PUBLIC] C $ XCC :== $C:CC.EXE CC $ XAS :== $C:AS.EXE AS CC Reference Manual Page 7 Usage The above enables use of the above-mentioned command sequences. If your compiled C program is to make use of the (Unix-compatible) startup sequence, you must proceed as follows: $ XCC foo $ XAS -d foo $ MCR TKB foo,foo=foo,c:c/lb Then, you must type: $ FOOBAR :== "$DISK:[ACCOUNT]FOO.EXE " $ FOOBAR Unix-style parameters The `$' tells the VMS command interpretor that a command is being defined. Note that a dummy parameter must be specified. This will become the `task name' (argv[0]) when the program starts. 2.3.2 Setup under RSTS/E RSX emulation mode - Under RSTS/E, the system manager must define the XCC and XAS CCL commands and the C: system-wide logical in a start control file such as the following (the account may be chosen to meet the system manager's needs): RUN $UTILTY ? ADD LOGICAL SY:[5,2]C ? CCL XAS-=C:AS.TSK;0 ? CCL XCC-=C:CC.TSK;0 ? CCL MCR-=C:MCR.*;30000 ? EXIT 2.3.3 Setup under RSX11M - You should define C: as a logical device. If this is painful, edit SYSINC (in CC0RT.MAC) or globally patch it to redefine the C library as LB: or whatever. Then, either install CC and AS as defined tasks or execute them via MCR. The RSX11-M build procedure provided causes SYSINC to be redefined as `LB[1,1]'. 2.3.4 Setup under RT11 and RSTS/E RT11 mode - Under RSTS/E, the system manager must execute a startup control file such as the following: RUN $UTILTY CC Reference Manual Page 8 Usage ? ADD LOGICAL SY:[5,2]C ? CCL AS-=C:AS.SAV;8192 ? CCL CC-=C:CC.SAV;8220 ? CCL MCR-=C:MCR.*;30000 ? EXIT Under RT11, the startup sequence is limited to definition of the C: logical device. This compiler has been used on PDT150 systems. MCR.BAS is not used on native RT11. Parameters may be passed to the running program by appending them to the run command: RUN PROG.SAV pmtr1 pmtr2 "p m t r 3" 2.4 Invoking compiled C programs When your program is entered and the start module believes that a command has been typed, a Unix C setup sequence is emulated, including I/O redirection and command argument processing. The startup module does not expand wild-card filenames, however. On native RT11, if no command line has been passed, the module prompts `Argv: ' and accepts a single line which is then parsed into commands. This can be disabled by defining the $$narg global symbol as described in the library documentation. If you include an argument of the form `>file', standard output will be written to the indicated file. If you include an argument of the form `>>file', standard output will be appended to the file (creating it if necessary). (Append does not work on RT11-modes.) If you include an argument of the form ` parameter in the command definition as shown above. CC Reference Manual Page 9 Usage o On RSTS/E, this will be the CCL name or the program name as passed to the MCR program. o On RT11 (or, by default, if no name can be found), this will be the string `Argv: '. For example: /* * Echo arguments */ main(argc, argv) int argc; char *argv[]; { register int i; printf("Program \"%s\" has %d parameters\n", argv[0], argc); for (i = 1; i < argc; i++) printf("Argument %d = \"%s\"\n", i, argv[i]); } The above program is executed as follows on VMS: $ ECHO abc "def ghi" Program "ECHO" has 3 parameters Argument 0 = "ECHO" Argument 1 = "ABC" Argument 2 = "def ghi" Notice that unquoted arguments are converted to upper case by the operating system. Under RSTS/E, a C program may be installed as a CCL command or the program may be started using the MCR CCL command which emulates a CCL invocation for C programs. 2.5 Predefined variables Before reading the program source file, the C compiler defines several variables (which may then be tested with `#ifdef' statements): decus This is the Decus compiler. nofpu This version does not support floating-point. CC Reference Manual Page 10 Usage nomacarg This version does not allow macros with arguments. pdp11 Generate code for the PDP-11. rsx The RSX compiler (or) rt11 The RT11 compiler 2.6 Profiling The profiler permits the accumulation of function call statistics during the execution of a program. If any of the files comprising a program were compiled with the profile option (and at least one of them has been called) then a call profile, listing the function name and the number of calls, will be written to file `profil.out' when the program terminates. Also, if the program terminates because of a fatal error (such as an illegal memory reference), a register dump and call trace will be printed on the command terminal. The run-time library contains several functions that can be called to dynamically print flow trace information. 2.7 Diagnostics There are two general classes of diagnostics; those that relate to compiler conditions, and those that relate to errors in the user's program. The only type of compiler condition messages the user should see are those of the form `Cannot open .... file'. These mean exactly what they say. Other compiler condition messages are `Abort in phase x', `Abort loading phase x' and `Trap type x', where `x' is replaced by some small constant. These are most likely attempts to use floating-point operations. If not, you are the proud owner of a compiler bug. Report your find to a guru. Remember the register dump and save your source file and both temporary files. They are important. If you blunder into a missing code table the compiler aborts with an error message. Errors in the user's programs are reported in English, tagged by CC Reference Manual Page 11 Usage the linenumber (which may be off by 1). Because of the nature of the language, errors sometimes snowball. If you are greeted by thousands of error messages, try fixing up the first few. You may be pleasantly surprised. The following are common sources of `thousands of errors': o If there is a missing right brace within a function, all succeding functions will miscompile. The error message will include a tag of the form `within function xxxxx', where `xxxxx' is the function with the missing brace. o If there is a missing right parenthesis in an if or while statement which is followed by a left brace, the syntax analyser will `lose' the brace, causing many messages: if ((foo = fopen("abc.def", "w") == NULL) { ... o In general, if the error message is `illegal expression', that is (probably) the current line. If the message is `illegal statement', you should look at the previous statement. CC Reference Manual Page 12 Runtime Environment 3.0 Runtime Environment This description of the C runtime enviornment is sketchy. The best reference is compiler generated code, and any question regarding `how does it ....' can usually be answered by compiling a suitably contrived program. 3.1 Program Sections The C compiler uses 5 program sections. The `.PROG.' psection is used for all code. The `.DATA.' psection is used for all external and static data. The `.STRN.' psection is used for the bodies of all strings. The `.PROF.' psection is used to hold the names of functions for the profiler. The `.MWCN.' psection is used to hold multi-word (long and floating-point) constants. All code is `pure'. However, the assembler is not able to generate all the varieties of .PSECTs. Thus, everything is read-write. This should be changed. Also, the compiler does not write a symbol table as such, making debugging a chore. 3.2 Register Usage R5 is used as an environment frame pointer. It points to the highest address of the stack frame of the current function. In MACRO programs, symbols C$PMTR and C$AUTO may be used to refer to the first parameter and first automatic variable, respectively. Thus, when writing a MACRO subroutine, you should write: MOV C$PMTR+(R5), Dst To access parameters (the first parameter_number is 0). (This cannot be done when using the AS assembler.) To access automatic variables, you should write: MOV C$AUTO-(R5), Dst Where the first variable_number is numbered 1. (This cannot be done when using the AS assembler). Registers R2, R3 and R4 are used as register variables. The first register variable goes in R4, the second in R3 and the third in R2. Any register not used as a register variable is available as a temporary. Registers R0 and R1 are always scratch registers. CC Reference Manual Page 13 Runtime Environment 3.3 Calling Sequence The first instructions in a C function are a `JSR R5,CSV$' and a subtract to claim stack space. The `CSV$' routine points R5 at the new stack frame and pushes registers R4, R3, and R2 onto the stack (Note that the character `$' in the CC/MACRO environment, is represented by `~' in the AS environment). R0, R1 and the floating point registers are NOT saved. This means that if a C function is called asyncronously (i.e. from an AST routine) the caller must arrange to save these registers or be prepared to face the music. Functions return via a `JMP CRET$'. The return value is in R0 (for ints, chars and pointers), R0-R1 (for longs, high part in R0) or AC0 (floats and doubles). The caller passes control to a function by first pushing the arguments (from right to left) onto the stack, calling the function via a `JSR PC,FUNCTION', and popping the arguments off of the stack. All arguments are passed as ints, longs (push low part, then push high part) or doubles. Characters are passed as integers; floats are passed as doubles. 3.4 Profiler When a program is compiled with the `p' option, the standard save is replaced by a `JSR R5,PCSV$'. Immediately following the call is a pointer to a zero word (for the counter) followed by the name of the function (in the `.PROF.' psection as a null terminated string). The `PCSV$' routine increments the zero word on every call: .psect .prog. entry: jsr r5,pcsv$ .word prof .psect .prof. prof: .word 0 ; Incremented at each call .asciz /entry/ ; Function name .even .psect .prog ... The printing of the profile is arranged by having `PCSV$' stuff a global cell `$$PROF' with a pointer to the profile print routine. This routine (called automagically on exit) scans CC Reference Manual Page 14 Runtime Environment through core looking for `JSR R5,PCSV' instructions, and printing the statistics to the file `profil.out' via `fprintf'. The trace module has several other attributes: o If the program fails because of an unexpected trap to the operating system (and the profile collection code was executed at least once), a register dump will be printed on the command terminal and the program will exit by calling error(). o If the function's execution would cause the stack pointer to go below 600 octal, the program will be aborted after printing an error message. o It is possible to obtain a dynamic trace of the flow of a program by assigning the file descriptor of an open file to global variable `$$flow'. For example: #include extern FILE *$$flow; main () { $$flow = fopen("trace.out", "w"); process(); } Note that the program may execute $$flow = stdout; to write the trace to the command terminal. To turn off tracing, close $$flow and set $$flow = NULL. o The caller() function may be used to obtain the name of a routine's caller: main () { subr(); } subr () { printf("%s\n", caller()); } When subr() is executed, it will print `main'. o The calltr() function may be used to print a trace of calls from main() to the function that called calltr(): main () CC Reference Manual Page 15 Runtime Environment { subr(); } subr () { calltr(stdout); } When subr() is executed, it will print: [ main subr ] on the standard output file. If some routine in the call trace was not compiled with profiling, the octal address of the routine's entry point is printed. If the routine gets confused (perhaps because the program is exiting due to a trap), it prints `'. o If the program exits by calling error() and the profile collection code as exeãutåd at |east once< a call trace will be printed on txe coímand terminal. 3.5 Examp|e A function max(a, b), wèich returns the maximum value of its two integer arguments may be written ás follows: max(arga, argb) tine's entry point is printed. If the routine gets confused (perhaps because the program is exiting due to a trap), it prints `'. o If the program exits by calling error() and the profile collection co blt .0 mov 2(sp),r0 br .1 .0: mov 4(sp),r0 .1: jmp cret$ CC Reference Manual Page 16 Runtime Environment 4.0 Quirks and Bugs The language accepted by the compiler is the language described in the Unix Seventh Edition memo (and Kernighan and Ritchie) with several exceptions. The file `C:CBUGS.DOC' contains a current list of bugs. These should be regarded as restrictions -- anything that was easy to fix has been fixed. o The AS assembler recognizes several pre-defined variables. Consequently, the following may not be used by a C program: `r0, r1, r2, r3, r4, r5, sp, and pc'. o Initialisation of automatic and local static variables is not supported. o Enumerations are not supported. o Bit fields do not work -- attempting to use bit fields will cause the compiler to abort with a `missing code table entry' error. o Symbols defined as global may not be redefined as local to a function. o Variables may only be declared at function entrance. The latest C language specification allows variable declaration at any block entrance. o Floating point is non-existant. If you attempt to compile a program that uses floating-point, the compiler will abort with a suitable message. o The compiler does not support "old-style" assigned binary operators. These will generally result in syntax errors. One exception (which started the whole mess) is "foo =- 6". This will be accepted by the compiler. Unfortunately, it will generate "foo = (-6)" when the program probably wanted "foo = foo - 6". You have been warned. o The include statement has two modes: #include "filename" Includes the fully-qualified file. #include Includes the library file, equivalent to: #include "C:filename" o Macros (#define statement with arguments) do not exist. o As noted in the library documentation, the following CC Reference Manual Page 17 Runtime Environment built-in function may be overridden by the C-program: wrapup() Called when the program exits. 4.1 Incompatibilities There are several incompatibilities between the current DECUS compiler and earlier versions which had been distributed by various DECUS special-interest groups. Those known (and the implications) are: o The RSX compiler's subroutine calling sequence has been changed to match the RT11 compiler's (and Unix's). This means that all user-written assembly-language code must be modified. The calling sequence appears to be compatible with the Unix and Whitesmith compilers, although library names are different. Also, the Whitesmith compiler has several optimizations in its subroutine calling sequence that are not present in this compiler. o The underscore character now generates a RAD50 dot, instead of a dollar-sign. The compiler allows dollar-signs in local and global variables. Thus, C programs can now access all PDP-11 global symbols. Because of the change of the meaning of underscore, all user-written assembly-language code must be modified. o I/O library conventions now generally follow the Unix V7 definitions. There are several implications. In general, however, all C-language I/O calls should be examined. The major problems are described below. o fopen("filename", "openmode") follows the RSX-library and Unix V7. This is incompatible with Unix V6 and the old RT11-library call. o fgets(buffer, sizeof buffer, fd) requires the second buffer size parameter, and does not remove the trailing newline. This follows Unix V7 I/O conventions. fgetss() is a new function, identical to fgets() except that it removes the trailing newline. fgetss() is compatible with the fgets() function in previous versions of the Decus compiler. o fputs(buffer, fd) does not append a newline to the record. This follows Unix V7 I/O conventions. fputss() is a new function, identical to fgets() except ôhat it appends a trailing newline. CC Reference Manual Page 18 Runtime Gnvironment o The "execute non-lïcaì goto" functions have been renamed. Unix V6 reset() and setexit() (Uniø V7 ìongzop() and se|exyt()) are called resgt() and }nwind() in this re|easg. T ~ew functions, envsave() and en~resgt8) are a|so pòåsent for this purpose. o The ctime() function (rg|}rn tiïe f day in Ascii) ôhat it appends a trailing newline. CC Reference Manual Page 18 Runtime Gnvironment o The "execute non-lïcaì goto" functions have been renamed. Unix V6 reset() and setexit() (Unograms will require no work whatsoever, most programs will require hánd editing. Note the following: o Floating point is non-existent. Many floati~g-poynt varyables can be recoded as long integers (large counter ôhat it appends a trailing newline. CC Reference Manual Page 18 Runtime Gnvironment o The "execute non-lïcaì goto" functions have been renamed. Unix V6 reset() and setexit() (Unt "foo =- 6" will parse, generating incorrect code. o The Decus compiler has a 500 word expression stack. This means that many complex expressions (especially those with embedded conditional statements) will cauóe the cïmpilation to abrt. Txis requires rewrityng. o The Decus compiler lacks macros ÷itè arguments. Many of these can be rewritten as func|yon calìs. If ôxe program intentio~ally maïås õse of ôhe fact that macros are expanded in-line, hand-editing will be needed. Note also that only one level of indirect (#include) file is supporôed. o Unix V6 I/O is not s}pðïrted. Thus, any program using read(), wryt cauóe the cïmpilation to abrt. Txis requires rewrityng. o The Decus compiler lacks macros ÷itè arguments. Many of these can be rewritten as func|yon calìs. If ôxe program intentio~ally maïås õse of ôhe fact ht-forward. Also, note that only a limited file random/access capability is present. o Very large programs (whych depgnd on Unix's ability to generate programs witx seperate instruction and data spacå) must be r cauóe the cïmpilation to abrt. Txis requires rewrityng. o The Decus compiler lacks macros ÷itè arguments. Many of these can be rewritten as func|yon calìs. If ôxe program intentio~ally maïås õse of ôhe fact allocated on function entrance) must be linked with enough stack space. When testing a program, it is highly recommended that the program be compiled with profiling as this enables a stack overflow check on function entrance. In general, the programmer be alert to such minor incompatibilities that do exist