|< C/C++ Statements, Expressions & Operators 3 | Main | C & C++ Functions >| Site Index | Download |


 

 

 

 

STANDARD C TO IMPLEMENTATION SPECIFIC: SOME STORY

 

 

 

 

The second edition, ISO/IEC 9899:1999 (C99)  cancels and replaces the first edition, ISO/IEC 9899:1990, as amended and corrected by ISO/IEC 9899/COR1:1994, ISO/IEC 9899/AMD1:1995, and ISO/IEC 9899/COR2:1996. In the standard there are many recommendations that supposed to be implemented by vendors. In this module we will see some Standard C definitions and how it is implemented in VC++ (VC++ 2005 in this case) by Microsoft.

 

 

 

 

 

 

 

 

 

 

The C Constants

 

A "constant" is a number, character, or character string that can be used as a value in a program. Use constants to represent floating-point, integer, enumeration, or character values that cannot be modified.

 

  1. floating-point-constant

  2. integer-constant

  3. enumeration-constant

  4. character-constant

 

Constants are characterized by having a value and a type. Floating-point, integer, and character constants are discussed in the next three sections. Enumeration constants are described in Enumeration Declarations.

 

An Integer-constant

 

An "integer constant" is a decimal (base 10), octal (base 8), or hexadecimal (base 16) number that represents an integral value. Use integer constants to represent integer values that cannot be changed. The following syntax implemented by Microsoft also defined by ISO/IEC 9899:1999.

 

The Syntax

integer-constant:

decimal-constant integer-suffix opt

octal-constant integer-suffix opt

hexadecimal-constant integer-suffix opt

decimal-constant:

nonzero-digit

decimal-constant digit

octal-constant:

0

octal-constant octal-digit

hexadecimal-constant:

0x hexadecimal-digit

0X hexadecimal-digit

hexadecimal-constant hexadecimal-digit

nonzero-digit: one of

1 2 3 4 5 6 7 8 9

octal-digit: one of

0 1 2 3 4 5 6 7

hexadecimal-digit: one of

0 1 2 3 4 5 6 7 8 9

a b c d e f

A B C D E F

integer-suffix:

unsigned-suffix long-suffix opt

long-suffix unsigned-suffix opt

unsigned-suffix: one of

u          U

long-suffix: one of

l           L

64-bit integer-suffix:

i64

The following table lists the suffix that defined by ISO, also implemented by Microsoft. You can find it usage in variable declaration, the standard input function (scanf()/scanf_s()) and standard output function, printf()/printf_s()). You will find the examples in the C lab worksheet practice.

 

Suffix

Decimal Constant

Octal or Hexadecimal Constant

none

int

long int

long long int

int

unsigned int

long int

unsigned long int

long long int

unsigned long long int

u or U

unsigned int

unsigned long int

unsigned long long int

unsigned int

unsigned long int

unsigned long long int

l or L

long int

long long int

long int

unsigned long int

long long int

unsigned long long int

Both u or U and l or L

unsigned long int

unsigned long long int

unsigned long int

unsigned long long int

ll or LL

long long int

long long int

unsigned long long int

Both u or U and ll or LL

unsigned long long int

unsigned long long int

 

Integer constants are positive unless they are preceded by a minus sign (–). The minus sign is interpreted as the unary arithmetic negation operator. An integer constant begins with a digit, but has no period or exponent part. It may have a prefix that specifies its base and a suffix that specifies its type. If an integer constant begins with 0x or 0X, it is hexadecimal. If it begins with the digit 0, it is octal. Otherwise, it is assumed to be decimal. The following lines are equivalent:

 

0x1C   /* = Hexadecimal representation for decimal 28 */

034     /* = Octal representation for decimal 28 */

 

No white-space characters can separate the digits of an integer constant. These examples show valid decimal, octal, and hexadecimal constants.

 

/* Decimal Constants */

10

132

32179

 

/* Octal Constants */

012

0204

076663

 

/* Hexadecimal Constants */

0xa or 0xA

0x84

0x7dB3 or 0X7DB3

 

Numerical limits

 

An implementation is required to document all the limits specified in this subclause, which are specified in the headers <limits.h> and <float.h>. Additional limits are specified in <stdint.h>.

 

Sizes of integer types <limits.h> (ISO/IEC 9899:1999)

 

An implementation is required to document all the limits specified in this subclause, which are specified in the headers <limits.h> and <float.h>. Additional limits are specified in <stdint.h>.

 

Constant

Meaning

Value

CHAR_BIT

number of bits for smallest object that is not a bit-field (byte)

8

SCHAR_MIN

minimum value for an object of type signed char

127 // -(27 - 1)

SCHAR_MAX

maximum value for an object of type signed char

+127 // 27 - 1

UCHAR_MAX

maximum value for an object of type unsigned char

255 // 28 - 1

CHAR_MIN

minimum value for an object of type char

a, see below

CHAR_MAX

maximum value for an object of type char

a, see below

MB_LEN_MAX

maximum number of bytes in a multibyte character, for any supported locale

1

SHRT_MIN

minimum value for an object of type short int

-32767 // -(215 - 1)

SHRT_MAX

maximum value for an object of type short int

+32767 // 215 - 1

USHRT_MAX

maximum value for an object of type unsigned short int

65535 // 216 - 1

INT_MIN

minimum value for an object of type int

-32767 // -(215 - 1)

INT_MAX

maximum value for an object of type int

+32767 // 215 - 1

UINT_MAX

maximum value for an object of type unsigned int

65535 // 216 - 1

LONG_MIN

minimum value for an object of type long int

-2147483647 // -(231 - 1)

LONG_MAX

maximum value for an object of type long int

+2147483647 // 231 - 1

ULONG_MAX

maximum value for an object of type unsigned long int

4294967295 // 232 - 1

LLONG_MIN

minimum value for an object of type long long int

-9223372036854775807 // -(263 - 1)

LLONG_MAX

maximum value for an object of type long long int

+9223372036854775807 // 263 - 1

ULLONG_MAX

maximum value for an object of type unsigned long long int

18446744073709551615 // 264 - 1

 

a - If the value of an object of type char is treated as a signed integer when used in an expression, the value of CHAR_MIN shall be the same as that of SCHAR_MIN and the value of CHAR_MAX shall be the same as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall be 0 and the value of CHAR_MAX shall be the same as that of UCHAR_MAX. The value UCHAR_MAX shall equal 2CHAR_BIT - 1.

 

Microsoft Implementation

 

The limits for integer types are listed in the following table. These limits are defined in the standard header file LIMITS.H. Microsoft C also permits the declaration of sized integer variables, which are integral types of size 8-, 16-, or 32-bits.

 

Limits on Integer Constants

 

Constant

Meaning

Value

CHAR_BIT

Number of bits in the smallest variable that is not a bit field.

8

SCHAR_MIN

Minimum value for a variable of type signed char.

–128

SCHAR_MAX

Maximum value for a variable of type signed char.

127

UCHAR_MAX

Maximum value for a variable of type unsigned char.

255 (0xff)

CHAR_MIN

Minimum value for a variable of type char.

–128; 0 if /J option used

CHAR_MAX

Maximum value for a variable of type char.

127; 255 if /J option used

MB_LEN_MAX

Maximum number of bytes in a multicharacter constant.

5

SHRT_MIN

Minimum value for a variable of type short.

–32768

SHRT_MAX

Maximum value for a variable of type short.

32767

USHRT_MAX

Maximum value for a variable of type unsigned short.

65535 (0xffff)

INT_MIN

Minimum value for a variable of type int.

–2147483647 – 1

INT_MAX

Maximum value for a variable of type int.

2147483647

UINT_MAX

Maximum value for a variable of type unsigned int.

4294967295 (0xffffffff)

LONG_MIN

Minimum value for a variable of type long.

–2147483647 – 1

LONG_MAX

Maximum value for a variable of type long.

2147483647

ULONG_MAX

Maximum value for a variable of type unsigned long.

4294967295 (0xffffffff)

 

If a value exceeds the largest integer representation, the Microsoft compiler generates an error.

 

Sizes of Fundamental Types

 

The following table summarizes the storage associated with each basic type.

 

Type

Storage

char, unsigned char, signed char

1 byte

short, unsigned short

2 bytes

int, unsigned int

4 bytes

long, unsigned long

4 bytes

float

4 bytes

double

8 bytes

long double

8 bytes

 

The C data types fall into general categories. The "integral types" include char, int, short, long, signed, unsigned, and enum. The "floating types" include float, double, and long double. The "arithmetic types" include all floating and integral types.

 

C Sized Integer Types: Microsoft Implementation

 

Microsoft C features support for sized integer types. You can declare 8-, 16-, 32-, or 64-bit integer variables by using the __intn type specifier, where n is the size, in bits, of the integer variable. The value of n can be 8, 16, 32, or 64. The following example declares one variable of each of the four types of sized integers:

 

__int8 nSmall;         // declares 8-bit integer

__int16 nMedium;  // declares 16-bit integer

__int32 nLarge;     // declares 32-bit integer

__int64 nHuge;      // declares 64-bit integer

 

The first three types of sized integers are synonyms for the ANSI types that have the same size, and are useful for writing portable code that behaves identically across multiple platforms. Note that the __int8 data type is synonymous with type char, __int16 is synonymous with type short, and __int32 is synonymous with type int. The __int64 type has no equivalent ANSI counterpart.

 

C Floating-Point Constants

 

A "floating-point constant" is a decimal number that represents a signed real number. The representation of a signed real number includes an integer portion, a fractional portion, and an exponent. Use floating-point constants to represent floating-point values that cannot be changed. The following syntax implemented by Microsoft and also defined by ISO/IEC 9899:1999.

 

Syntax

floating-point-constant:

fractional-constant exponent-part opt floating-suffix opt

digit-sequence exponent-part floating-suffix opt

fractional-constant:

digit-sequence opt . digit-sequence

digit-sequence.

exponent-part:

e sign opt digit-sequence

E sign opt digit-sequence

sign : one of

+ –

digit-sequence:

digit

digit-sequence digit

floating-suffix : one of

f  l  F  L

The ISO said that a floating constant has a significand part that may be followed by an exponent part and a suffix that specifies its type. The components of the significand part may include a digit sequence representing the whole-number part, followed by a period (.), followed by a digit sequence representing the fraction part. The components of the exponent part are an e, E, p, or P (p and P is for binary-exponent-part) followed by an exponent consisting of an optionally signed digit sequence. Either the whole-number part or the fraction part has to be present; for decimal floating constants, either the period or the exponent part has to be present.

An unsuffixed floating constant has type double. If suffixed by the letter f or F, it has type float. If suffixed by the letter l or L, it has type long double.

You can omit either the digits before the decimal point (the integer portion of the value) or the digits after the decimal point (the fractional portion), but not both. You can leave out the decimal point only if you include an exponent. No white-space characters can separate the digits or characters of the constant. The following examples illustrate some forms of floating-point constants and expressions:

 

15.75

1.575E1   /* = 15.75   */

1575e-2   /* = 15.75   */

-2.5e-3     /* = -0.0025 */

25E-4      /* =  0.0025 */

 

Floating-point constants are positive unless they are preceded by a minus sign (–). In this case, the minus sign is treated as a unary arithmetic negation operator. Floating-point constants have type float, double, long, or long double.

A floating-point constant without an f, F, l, or L suffix has type double. If the letter f or F is the suffix, the constant has type float. If suffixed by the letter l or L, it has type long double. For example:

 

100   /* has type double */

100L  /* has type long double */

100F  /* has type float       */

 

Note that the Microsoft C compiler maps long double to type double. You can omit the integer portion of the floating-point constant, as shown in the following examples. The number .75 can be expressed in many ways, including the following:

 

.0075e2

0.075e1

.075e1

75e-2

 

Characteristics of floating types <float.h>

 

The Limits on Floating-Point Constants (ISO/IEC 9899:1999)

 

The characteristics of floating types are defined in terms of a model that describes a representation of floating-point numbers and values that provide information about an implementation’s floating-point arithmetic.16) The following parameters are used to define the model for each floating-point type:

 

Symbol

Description

s

Sign (±1)

b

Base or radix of exponent representation (an integer > 1)

e

Exponent (an integer between a minimum emin and a maximum emax)

p

Precision (the number of base-b digits in the significand)

fk

Nonnegative integers less than b (the significand digits)

 

A floating-point number (x) is defined by the following model:

 

 

Constant

Meaning

Value

FLT_RADIX

radix of exponent representation, b

2

FLT_MANT_DIG

DBL_MANT_DIG

LDBL_MANT_DIG

number of base-FLT_RADIX digits in the floating-point significand, p

-

DECIMAL_DIG

number of decimal digits, n, such that any floating-point number in the widest supported floating type with pmax radix b digits can be rounded to a floating-point number with n decimal digits and back again without change to the value

10

FLT_DIG

DBL_DIG

LDBL_DIG

number of decimal digits, q, such that any floating-point number with q decimal digits can be rounded into a floating-point number with p radix b digits and back again without change to the q decimal digits

6

10

10

FLT_MIN_EXP

DBL_MIN_EXP

LDBL_MIN_EXP

minimum negative integer such that FLT_RADIX raised to one less than that power is a normalized floating-point number, emin

-

FLT_MIN_10_EXP

DBL_MIN_10_EXP

LDBL_MIN_10_EXP

minimum negative integer such that 10 raised to that power is in the range of normalized floating-point numbers

-37

-37

-37

FLT_MAX_EXP

DBL_MAX_EXP

LDBL_MAX_EXP

maximum integer such that FLT_RADIX raised to one less than that power is a representable finite floating-point number, emax

-

FLT_MAX_10_EXP

DBL_MAX_10_EXP

LDBL_MAX_10_EXP

maximum integer such that 10 raised to that power is in the range of representable finite floating-point numbers

+37

+37

+37

The values given in the following list shall be replaced by constant expressions with implementation-defined values that are greater than or equal to those shown:

FLT_MAX

DBL_MAX

LDBL_MAX

maximum representable finite floating-point number

1E+37

1E+37

1E+37

The values given in the following list shall be replaced by constant expressions with implementation-defined (positive) values that are less than or equal to those shown:

FLT_EPSILON

DBL_EPSILON

LDBL_EPSILON

the difference between 1 and the least value greater than 1 that is representable in the given floating point type

1E-5

1E-9

1E-9

FLT_MIN

DBL_MIN

LDBL_MIN

minimum normalized positive floating-point number

1E-37

1E-37

1E-37

 

Microsoft Implementation

 

Limits on the values of floating-point constants are given in the following table. The header file FLOAT.H contains this information.

 

Limits on Floating-Point Constants

 

Constant

Meaning

Value

FLT_DIGDBL_DIGLDBL_DIG

Number of digits, q, such that a floating-point number with q decimal digits can be rounded into a floating-point representation and back without loss of precision.

6 15 15

FLT_EPSILONDBL_EPSILONLDBL_EPSILON

Smallest positive number x, such that x + 1.0 is not equal to 1.0

1.192092896e–07F 2.2204460492503131e–016 2.2204460492503131e–016

FLT_GUARD

 

0

FLT_MANT_DIGDBL_MANT_DIGLDBL_MANT_DIG

Number of digits in the radix specified by FLT_RADIX in the floating-point significand. The radix is 2; hence these values specify bits.

24 53 53

FLT_MAXDBL_MAXLDBL_MAX

Maximum representable floating-point number.

3.402823466e+38F 1.7976931348623158e+308 1.7976931348623158e+308

FLT_MAX_10_EXPDBL_MAX_10_EXPLDBL_MAX_10_EXP

Maximum integer such that 10 raised to that number is a representable floating-point number.

38 308 308

FLT_MAX_EXPDBL_MAX_EXPLDBL_MAX_EXP

Maximum integer such that FLT_RADIX raised to that number is a representable floating-point number.

128 1024 1024

FLT_MINDBL_MINLDBL_MIN

Minimum positive value.

1.175494351e–38F 2.2250738585072014e–308 2.2250738585072014e–308

FLT_MIN_10_EXPDBL_MIN_10_EXPLDBL_MIN_10_EXP

Minimum negative integer such that 10 raised to that number is a representable floating-point number.

–37

–307

–307

FLT_MIN_EXPDBL_MIN_EXPLDBL_MIN_EXP

Minimum negative integer such that FLT_RADIX raised to that number is a representable floating-point number.

–125

–1021

–1021

FLT_NORMALIZE

 

0

FLT_RADIX_DBL_RADIX_LDBL_RADIX

Radix of exponent representation.

2 2 2

FLT_ROUNDS_DBL_ROUNDS_LDBL_ROUNDS

Rounding mode for floating-point addition.

1 (near) 1 (near) 1 (near)

 

Note that the information in the above table may differ in future implementations.

 

C Character Constants

 

A "character constant" is formed by enclosing a single character from the representable character set within single quotation marks (' '). Character constants are used to represent characters in the execution character set. The following syntax implemented by Microsoft and also defined by ISO/IEC 9899:1999.

 

Syntax

character-constant:

'c-char-sequence'

L'c-char-sequence'

c-char-sequence:

c-char

c-char-sequence c-char

c-char:

Any member of the source character set except the single quotation mark ('), backslash (\), or newline character

escape-sequence

escape-sequence:

simple-escape-sequence

octal-escape-sequence

hexadecimal-escape-sequence

simple-escape-sequence: one of

\a \b \f \n \r \t \v

\' \" \\ \?

octal-escape-sequence:

\ octal-digit

\ octal-digit octal-digit

\ octal-digit octal-digit octal-digit

hexadecimal-escape-sequence:

\x hexadecimal-digit

hexadecimal-escape-sequence hexadecimal-digit

The ISO said that an integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'. A wide character constant is the same, except prefixed by the letter L. With a few exceptions detailed later, the elements of the sequence are any members of the source character set; they are mapped in an implementation-defined manner to members of the execution character set. An integer character constant has type int.

The single-quote ', the double-quote ", the question-mark ?, the backslash \, and arbitrary integer values are representable according to the following table of escape sequences:

  1. Single quote' \'

  2. Double quote" \"

  3. Question mark? \?

  4. Backslash\ \\

  5. Octal character \octal digits

  6. Hexadecimal character \x hexadecimal digits

 

The double-quote " and question-mark ? are representable either by themselves or by the escape sequences \" and \?, respectively, but the single-quote ' and the backslash \ shall be represented, respectively, by the escape sequences \' and \\.

The octal digits that follow the backslash in an octal escape sequence are taken to be part of the construction of a single character for an integer character constant or of a single wide character for a wide character constant. The numerical value of the octal integer so formed specifies the value of the desired character or wide character.

The hexadecimal digits that follow the backslash and the letter x in a hexadecimal escape sequence are taken to be part of the construction of a single character for an integer character constant or of a single wide character for a wide character constant. The numerical value of the hexadecimal integer so formed specifies the value of the desired character or wide character.

Each octal or hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence.

In addition, characters not in the basic character set are representable by universal character names and certain non-graphic characters are representable by escape sequences consisting of the backslash \ followed by a lowercase letter: \a, \b, \f, \n, \r, \t, and \v.

A wide character constant has type wchar_t, an integer type defined in the <stddef.h> header. The value of a wide character constant containing a single multibyte character that maps to a member of the extended execution character set is the wide character corresponding to that multibyte character, as defined by the mbtowc function, with an implementation-defined current locale. The value of a wide character constant containing more than one multibyte character, or containing a multibyte character or escape sequence not represented in the extended execution character set, is implementation-defined.

 

Enumeration constants (more in Module11)

 

An enumeration consists of a set of named integer constants. An enumeration type declaration gives the name of the (optional) enumeration tag and defines the set of named integer identifiers (called the "enumeration set," "enumerator constants," "enumerators," or "members"). A variable with enumeration type stores one of the values of the enumeration set defined by that type. Variables of enum type can be used in indexing expressions and as operands of all arithmetic and relational operators. Enumerations provide an alternative to the #define preprocessor directive with the advantages that the values can be generated for you and obey normal scoping rules.

In ANSI C, the expressions that define the value of an enumerator constant always have int type; thus, the storage associated with an enumeration variable is the storage required for a single int value. An enumeration constant or a value of enumerated type can be used anywhere the C language permits an integer expression.

 

The Syntax

enum-specifier:

enum identifier opt { enumerator-list }

enum identifier

The optional identifier names the enumeration type defined by enumerator-list. This identifier is often called the "tag" of the enumeration specified by the list. A type specifier of the form

 

enum identifier { enumerator-list }

 

declares identifier to be the tag of the enumeration specified by the enumerator-list nonterminal. The enumerator-list defines the "enumerator content." The enumerator-list is described in detail below.

If the declaration of a tag is visible, subsequent declarations that use the tag but omit enumerator-list specify the previously declared enumerated type. The tag must refer to a defined enumeration type, and that enumeration type must be in current scope. Since the enumeration type is defined elsewhere, the enumerator-list does not appear in this declaration. Declarations of types derived from enumerations and typedef declarations for enumeration types can use the enumeration tag before the enumeration type is defined.

 

The Syntax

enumerator-list:

enumerator

enumerator-list, enumerator

enumerator:

enumeration-constant

enumeration-constant = constant-expression

enumeration-constant:

identifier

Each enumeration-constant in an enumeration-list names a value of the enumeration set. By default, the first enumeration-constant is associated with the value 0. The next enumeration-constant in the list is associated with the value of ( constant-expression + 1 ), unless you explicitly associate it with another value. The name of an enumeration-constant is equivalent to its value. You can use enumeration-constant = constant-expression to override the default sequence of values. Thus, if enumeration-constant = constant-expression appears in the enumerator-list, the enumeration-constant is associated with the value given by constant-expression. The constant-expression must have int type and can be negative. The following rules apply to the members of an enumeration set:

 

 

Examples

 

These examples illustrate enumeration declarations:

 

enum DAY            /* defines an enumeration type    */

{

    saturday,          /* names day and declares a       */

    sunday = 0,      /* variable named workday with    */

    monday,           /* that type                      */

    tuesday,

    wednesday,      /* wednesday is associated with 3 */

    thursday,

    friday

} workday;

 

The value 0 is associated with saturday by default. The identifier sunday is explicitly set to 0. The remaining identifiers are given the values 1 through 5 by default.

In this example, a value from the set DAY is assigned to the variable today.

 

enum DAY today = wednesday;

 

Note that the name of the enumeration constant is used to assign the value. Since the DAY enumeration type was previously declared, only the enumeration tag DAY is necessary.

To explicitly assign an integer value to a variable of an enumerated data type, use a type cast:

 

workday = ( enum DAY ) ( day_value - 1 );

 

This cast is recommended in C but is not required.

 

enum BOOLEAN  /* Declares an enumeration data type called BOOLEAN */

{

    false,     /* false = 0, true = 1 */

    true

};

 

enum BOOLEAN end_flag, match_flag; /* Two variables of type BOOLEAN */

 

This declaration can also be specified as

 

enum BOOLEAN { false, true } end_flag, match_flag;

 

or as

 

enum BOOLEAN { false, true } end_flag;

enum BOOLEAN match_flag;

 

An example that uses these variables might look like this:

 

if ( match_flag == false )

    {

       /* statement */

     }

    end_flag = true;

 

Unnamed enumerator data types can also be declared. The name of the data type is omitted, but variables can be declared. The variable response is a variable of the type defined:

 

enum { yes, no } response;

 

Trigraphs

 

The source character set of C source programs is contained within the 7-bit ASCII character set but is a superset of the ISO 646-1983 Invariant Code Set. Trigraph sequences allow C programs to be written using only the ISO (International Standards Organization) Invariant Code Set. Trigraphs are sequences of three characters (introduced by two consecutive question marks) that the compiler replaces with their corresponding punctuation characters. You can use trigraphs in C source files with a character set that does not contain convenient graphic representations for some punctuation characters.

The following table shows the nine trigraph sequences. All occurrences in a source file of the punctuation characters in the first column are replaced with the corresponding character in the second column. The Trigraphs also defined by ISO/IEC 9899:1999.

 

Trigraph Sequences

Trigraph

Punctuation Character

??=

#

??(

[

??/

\

??)

]

??'

^

??<

{

??!

|

??>

}

??-

~

 

A trigraph is always treated as a single source character. The translation of trigraphs takes place in the first translation phase, before the recognition of escape characters in string literals and character constants. Only the nine trigraphs shown in the above table are recognized. All other character sequences are left un-translated. The character escape sequence, \?, prevents the misinterpretation of trigraph-like character sequences. For example, if you attempt to print the string What??! with this printf statement

 

printf( "What??!\n" );

 

the string printed is What| because ??! is a trigraph sequence that is replaced with the | character. Write the statement as follows to correctly print the string:

 

printf( "What?\?!\n" );

 

In this printf statement, a backslash escape character in front of the second question mark prevents the misinterpretation of ??! as a trigraph.

 

Phases of Translation

 

C and C++ programs consist of one or more source files, each of which contains some of the text of the program. A source file, together with its include files (files that are included using the #include preprocessor directive) but not including sections of code removed by conditional-compilation directives such as #if, is called a "translation unit."

Source files can be translated at different times — in fact, it is common to translate only out-of-date files. The translated translation units can be processed into separate object files or object-code libraries. These separate, translated translation units are then linked to form an executable program or a dynamic-link library (DLL). For more information about files that can be used as input to the linker, see LINK Input Files. Translation units can communicate using:

 

  1. Calls to functions that have external linkage.

  2. Calls to class member functions that have external linkage.

  3. Direct modification of objects that have external linkage.

  4. Direct modification of files.

  5. Interprocess communication (for Microsoft Windows-based applications only).

 

The following list describes the phases in which the compiler translates files:

 

Phase

Description

Character mapping

Characters in the source file are mapped to the internal source representation. Trigraph sequences are converted to single-character internal representation in this phase.

Line splicing

All lines ending in a backslash (\) and immediately followed by a newline character are joined with the next line in the source file forming logical lines from the physical lines. Unless it is empty, a source file must end in a newline character that is not preceded by a backslash.

Tokenization

The source file is broken into preprocessing tokens and white-space characters. Comments in the source file are replaced with one space character each. Newline characters are retained.

Preprocessing

Preprocessing directives are executed and macros are expanded into the source file. The #include statement invokes translation starting with the preceding three translation steps on any included text.

Character-set mapping

All source character set members and escape sequences are converted to their equivalents in the execution character set. For Microsoft C and C++, both the source and the execution character sets are ASCII.

String concatenation

All adjacent string and wide-string literals are concatenated. For example, "String " "concatenation" becomes "String concatenation".

Translation

All tokens are analyzed syntactically and semantically; these tokens are converted into object code.

Linkage

All external references are resolved to create an executable program or a dynamic-link library.

 

The compiler issues warnings or errors during phases of translation in which it encounters syntax errors. The linker resolves all external references and creates an executable program or DLL by combining one or more separately processed translation units along with standard libraries.

 

Translation limits

 

The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits (ISO/IEC 9899:1999 (C99)):

 

  1. 127 nesting levels of blocks.

  2. 63 nesting levels of conditional inclusion.

  3. 12 pointer, array, and function declarators (in any combinations) modifying an arithmetic, structure, union, or incomplete type in a declaration.

  4. 63 nesting levels of parenthesized declarators within a full declarator.

  5. 63 nesting levels of parenthesized expressions within a full expression.

  6. 63 significant initial characters in an internal identifier or a macro name (each universal character name or extended source character is considered a single character).

  7. 31 significant initial characters in an external identifier (each universal character name specifying a short identifier of 0000FFFF or less is considered 6 characters, each universal character name specifying a short identifier of 00010000 or more is considered 10 characters, and each extended source character is considered the same number of characters as the corresponding universal character name, if any).

  8. 4095 external identifiers in one translation unit.

  9. 511 identifiers with block scope declared in one block.

  10. 4095 macro identifiers simultaneously defined in one preprocessing translation unit.

  11. 127 parameters in one function definition.

  12. 127 arguments in one function call.

  13. 127 parameters in one macro definition.

  14. 127 arguments in one macro invocation.

  15. 4095 characters in a logical source line.

  16. 4095 characters in a character string literal or wide string literal (after concatenation).

  17. 65535 bytes in an object (in a hosted environment only).

  18. 15 nesting levels for #included files.

  19. 1023 case labels for a switch statement (excluding those for any nested switch statements).

  20. 1023 members in a single structure or union.

  21. 1023 enumeration constants in a single enumeration.

  22. 63 levels of nested structure or union definitions in a single struct-declaration-list.

 

LINK (Microsoft VC++ linker) Input Files

 

You provide the linker with files that contain objects, import and standard libraries, resources, module definitions, and command input. LINK does not use file extensions to make assumptions about the contents of a file. Instead, LINK examines each input file to determine what kind of file it is.

Object files on the command line are processed in the order they appear on the command line. Libraries are searched in command line order as well, with the following caveat: Symbols that are unresolved when bringing in an object file from a library are searched for in that library first, and then the following libraries from the command line and /DEFAULTLIB (Specify Default Library) directives, and then to any libraries at the beginning of the command line.

LINK no longer accepts a semicolon (or any other character) as the start of a comment in response files and order files. Semicolons are recognized only as the start of comments in module-definition files (.def). LINK uses the following types of input files:

 

Input File

Description

.obj files

LINK accepts .obj files that are either Common Object File Format (COFF) or 32-bit Object Module Format (OMF). Microsoft's Visual C++ compiler creates COFF .obj files. LINK automatically converts 32-bit OMF objects to COFF. However, there are limitations to OMF to COFF conversions. OMF can represent some things that cannot be represented in COFF. If there are errors when the linker converts from OMF to COFF, then you will need to use COFF .obj files instead of OMF .obj files as input to the linker. In some circumstances, .obj files can be used instead of .netmodule files.

.netmodule files

link.exe now accepts MSIL .obj and .netmodules as input. The output file produced by the linker will be an assembly or a .netmodule with no run-time dependency on any of the .obj or .netmodules that were input to the linker.

.netmodules are created by the Visual C++ compiler with /LN (Create MSIL Module) or by the linker with /NOASSEMBLY (Create a MSIL Module). .objs are always created in a Visual C++ compilation. For other Visual Studio compilers, use the /target:module compiler option.

In most cases, you will need to pass to the linker the .obj file from the Visual C++ compilation that created the .netmodule, unless the .netmodule was created with /clr (Common Language Runtime Compilation). MSIL .netmodules used as input to the linker must be pure MSIL, which can be produced by the Visual C++ compiler using /clr:safe. Other Visual Studio compilers produce pure MSIL modules by default.

Passing a .netmodule or .dll file to the linker that was compiled by the Visual C++ compiler with /clr or with /clr:pure can result in a linker error.

The linker accepts native .obj files as well as MSIL .obj files compiled with /clr, /clr:pure, or /clr:safe. When passing mixed .objs in the same build, the verifiability of the resulting output file will, by default, be equal to the lowest level of verifiability of the input modules. For example, if you pass a safe and pure .obj to the linker, the output file will be pure. /CLRIMAGETYPE (Specify Type of CLR Image) lets you specify a lower level of verifiability, if that is what you need.

If you currently have an application that is composed of two or more assemblies and you want the application to be contained in one assembly, you must recompile the assemblies and then link the .objs or .netmodules to produce a single assembly.

You must specify an entry point using /ENTRY (Entry-Point Symbol) when creating an executable image.

When linking with an MSIL .obj or .netmodule file, use /LTCG (Link-time Code Generation), otherwise when the linker encounters the MSIL .obj or .netmodule, it will restart the link with /LTCG.

MSIL .obj or .netmodule files can also be passed to cl.exe.

Input MSIL .obj or .netmodule files cannot have embedded resources. A resource is embedded in an output file (module or assembly) with /ASSEMBLYRESOURCE (Embed a Managed Resource) linker option or with the /resource compiler option in other Visual Studio compilers.

When performing MSIL linking, and if you do not also specify /LTCG (Link-time Code Generation), you will see an informational message reporting that the link is restarting. This message can be ignored, but to improve linker performance with MSIL linking, explicitly specify /LTCG.

 

Example

 

In C++ code the catch block of a corresponding try will be invoked for a non System exception. However, by default, the CLR wraps non System exceptions with RuntimeWrappedException. When an assembly is created from Visual C++ and non Visual C++ modules and you want a catch block in C++ code to be invoked from its corresponding try clause when the try block throws a non System exception, you must add the

[assembly:System::Runtime::CompilerServices::RuntimeCompatibility(WrapNonExceptionThrows=false)] attribute to the source code for the non C++ modules.

 

// compile with: /c /clr

value struct V { };

 

ref struct MCPP {

   static void Test() {

      try {

         throw (gcnew V);

      }

      catch (V ^) {

         System::Console::WriteLine("caught non System exception in C++ source code file");

      }

   }

};

 

/*

int main() {

   MCPP::Test();

}

*/

 

By changing the Boolean value of the WrapNonExceptionThrows attribute, you modify the ability of the Visual C++ code to catch a non System exception.

 

// compile with: /target:module /addmodule:MSIL_linking.obj

// post-build command: link /LTCG MSIL_linking.obj MSIL_linking_2.netmodule /entry:MLinkTest.Main /out:MSIL_linking_2.exe /subsystem:console

using System.Runtime.CompilerServices;

 

// enable non System exceptions

[assembly:RuntimeCompatibility(WrapNonExceptionThrows=false)]

 

class MLinkTest {

   public static void Main() {

      try { MCPP.Test(); }

      catch (RuntimeWrappedException) {

         System.Console.WriteLine("caught a wrapped exception in C#");

      }

   }

}

 

Output

 

caught non System exception in C++ source code file

 

.lib files

LINK accepts COFF standard libraries and COFF import libraries, both of which usually have the extension .lib. Standard libraries contain objects and are created by the LIB tool. Import libraries contain information about exports in other programs and are created either by LINK when it builds a program that contains exports or by the LIB tool. A library is specified to LINK as either a file name argument or a default library. LINK resolves external references by searching first in libraries specified on the command line, then in default libraries specified with the /DEFAULTLIB option, and then in default libraries named in .obj files. If a path is specified with the library name, LINK looks for the library in that directory. If no path is specified, LINK looks first in the directory that LINK is running from, and then in any directories specified in the LIB environment variable.

 

To add .lib files as linker input in the development environment

 

  1. Open the project's Property Pages dialog box. For details, see Setting Visual C++ Project Properties.

  2. Click the Linker folder.

  3. Click the Input property page.

  4. Modify the Additional Dependencies property.

 

To programmatically add .lib files as linker input.

 

Example

 

The following sample shows how to build and use a .lib file:

 

// compile with: /LD

__declspec(dllexport) int Test() {

   return 213;

}

 

And then:

 

// compile with: /EHsc lib_link_input_1.lib

__declspec(dllimport) int Test();

#include <iostream>

int main() {

   std::cout << Test() << std::endl;

}

 

Output

 

213

 

.exp files

Export (.exp) files contain information about exported functions and data items. When LIB creates an import library, it also creates an .exp file. You use the .exp file when you link a program that both exports to and imports from another program, either directly or indirectly. If you link with an .exp file, LINK does not produce an import library, because it assumes that LIB already created one.

.def files

Module-definition (.def) files provide the linker with information about exports, attributes, and other information about the program to be linked. A .def file is most useful when building a DLL. Because there are linker options that can be used instead of module-definition statements, .def files are generally not necessary. You can also use __declspec(dllexport) as a way to specify exported functions. You can invoke a .def file during the linker phase with the /DEF (Specify Module-Definition File) linker option. If you are building an .exe file that has no exports, using a .def file will make your output file larger and slower loading. Use the /DEF option to specify the .def file name.

.pdb files

Object (.obj) files compiled using the /Zi option contain the name of a program database (PDB). You do not specify the object's PDB file name to the linker; LINK uses the embedded name to find the PDB if it is needed. This also applies to debuggable objects contained in a library; the PDB for a debuggable library must be available to the linker along with the library. LINK also uses a PDB to hold debugging information for the .exe file or the .dll file. The program's PDB is both an output file and an input file, because LINK updates the PDB when it rebuilds the program.

.res files

You can specify a .res file when linking a program. The .res file is created by the resource compiler (RC). LINK automatically converts .res files to COFF. The CVTRES.exe tool must be in the same directory as LINK.exe or in a directory specified in the PATH environment variable.

.exe files

The MS-DOS Stub File Name (/STUB) option specifies the name of an .exe file that runs with MS-DOS. LINK examines the specified file to be sure that it is a valid MS-DOS program.

 

/STUB:filename

 

where:

 

filename - An MS-DOS application.

 

Remarks

 

The /STUB option attaches an MS-DOS stub program to a Win32 program.

A stub program is invoked if the file is executed in MS-DOS. It usually displays an appropriate message; however, any valid MS-DOS application can be a stub program.

Specify a filename for the stub program after a colon (:) on the command line. The linker checks filename and issues an error message if the file is not an executable. The program must be an .exe file; a .com file is invalid for a stub program.

If this option is not used, the linker attaches a default stub program that issues the following message:

 

This program cannot be run in MS-DOS mode.

 

When building a virtual device driver, filename allows the user to specify a file name that contains an IMAGE_DOS_HEADER structure (defined in WINNT.H) to be used in the VXD, rather than the default header.

.txt files

LINK expects various text files as additional input. The command-file specifier (@) and the Base Address (/BASE), /DEF, and /ORDER options all specify text files. These files can have any extension, not just .txt.

.ilk files

When linking incrementally, LINK updates the .ilk status file that it created during the first incremental link. This file has the same base name as the .exe file or the .dll file, and it has the extension .ilk. During subsequent incremental links, LINK updates the .ilk file. If the .ilk file is missing, LINK performs a full link and creates a new .ilk file. If the .ilk file is unusable, LINK performs a non-incremental link.

 

All the input files for the linker can be included using the Linker options for the command line compilation. Using the GUI VC++ IDE you can set it through the project setting (Project > Project_name Properties).

 

VC++ project properties menu

 

 

VC++ IDE linker settings

 

 

VC++ Linker setting as command line options

 

For the command line, to run LINK.EXE, use the following command syntax:

 

LINK arguments

 

The arguments include options and filenames and can be specified in any order. Options are processed first, then files. Use one or more spaces or tabs to separate arguments. On the command line, an option consists of an option specifier, either a dash (–) or a forward slash (/), followed by the name of the option. Option names cannot be abbreviated. Some options take an argument, specified after a colon (:). No spaces or tabs are allowed within an option specification, except within a quoted string in the /COMMENT option. Specify numeric arguments in decimal or C-language notation. Option names and their keyword or filename arguments are not case sensitive, but identifiers as arguments are case sensitive. To pass a file to the linker, specify the filename on the command line after the LINK command. You can specify an absolute or relative path with the filename, and you can use wildcards in the filename. If you omit the dot (.) and filename extension, LINK assumes .obj for the purpose of finding the file. LINK does not use filename extensions or the lack of them to make assumptions about the contents of files; it determines the type of file by examining it, and processes it accordingly. link.exe returns zero for success (no errors). Otherwise, the linker returns the error number that stopped the link. For example, if the linker generates LNK1104, the linker returns 1104. Accordingly, the lowest error number returned on an error by the linker is 1000. A return value of 128 represents a configuration problem with either the operating system or a .config file; the loader didn’t load either link.exe or c2.dll.

 

 

 

 

|< C/C++ Statements, Expressions & Operators 3 | Main | C & C++ Functions >| Site Index | Download |


 

Standard C and Microsoft Implementation Specific:  Just Part 1