The C programming language is notorious for its type declarations. The programming language was designed more than 50 years ago. The designers of the language, apparently, didn’t pay much attention to making it easier to understand declarations. Consider the following declaration.
int *p[4];
How should we read it? Is the above statement declaring p
to be an array of four elements with each element pointing to an integer, or is it a pointer to an array of four elements each of which is an integer?
The above example is simple. We know that p
is an array of four elements, each of which is a pointer to an integer. Therefore, in the figure above, the graphical representation on the left is correct. Once we learn to decode C declarations, we will write the declaration for the graphical representation on the right.
The following declarations are more complicated.
char *(*(*a)())[10];int *(* const *b[8]) (void);char * const * (*c)(void);char *(*(*p[4])(char *))[];void (*s(int, void (*)(int)))(int);void *(*f(int))(int);struct IMAGE *(*(*(*fp)[5]))(const char *, int);char ** const * volatile x;char *(*(**f[][4])())[];
In this blog, we will learn how to read C declarations and apply that knowledge to convert the above declarations into simple English. We will first define some terminology and then outline the rules which will enable us to convert any declaration into a simple English sentence.
A declarator is a simple identifier (also called variable name), an array identifier (also called array variable name), a function name, or a pointer to any of the above, optionally followed by an equal sign and initial value or values. For example, first = 4
, second[4] = {1, 1, 2, 3}
, third()
, *fourth
, *fifth[4]
and *sixth()
are all valid declarators in the following declarations.
int first = 8;int second[4] = {1, 1, 2, 3};int third();int *fourth;int *fifth[4];int *sixth();
There may be any number of pointers, such as ***seventh
, any number of array dimensions, such as eighth[4][5][6];
but only one pair of function parentheses. The declarator ninth()()
is invalid. The declarators (*p)()[]
, and(*p)[]()
are also invalid.
An identifier, an identifier with array square brackets, or an identifier with function parentheses is also called a direct declarator. In the above examples, first
, second[4]
, third()
, fourth
, fifth[4]
, and sixth()
are direct declarators.
Type specifiers are char
, double
, float
, int
, long
, signed
, unsigned
, enum
, struct
, and union
. The keywords enum
, struct
and union
are usually followed by what is called a tag. The keywords struct
and union
declare complex types.
The storage class of a variable tells a compiler how to allocate memory for that variable. There are five storage classes, auto
, extern
, register
, static
, and typedef
. The typedef
storage class doesn't tell a compiler about memory allocation. It only defines a new name for a data type.
As of this writing, there are four type-qualifiers; const
, restrict
, volatile
, and _Atomic
. The type qualifiers restrict
and _Atomic
were introduced in C99 and C11 standards. _Atomic
is not only a type qualifier, but it is also a type specifier when used with standard type specifiers. For example, _Atomic(int)
is a type specifier and not a type qualifier. We will discuss _Atomic
in detail in another other blog.
If a type qualifier or qualifiers appear next to a type specifier ( int
, char
, float
, double
, etc.) it applies to that type-specifier. Otherwise, it applies to the asterisk pointer to its immediate left. The type qualifier restrict
only applies to pointers.
We will apply this rule several times in decoding C declarations so that it becomes clear.
Consider the following declaration.
int const *p;const int *q;
The const
keyword is next to a type specifier ( int
) in both declarations, therefore it applies to the type and not to the pointer asterisk. In the following declaration, the const
keyword is not next to the type specifier, hence it applies to the pointer asterisk to its immediate left.
char * const r;
We locate the first identifier reading from the left and then follow the precedence rules.
Rule 1. Read the postfix operators (square brackets indicating an array and parentheses indicating a function) from left to right, till the semicolon or the closing unmatched parenthesis is reached.
Rule 2. Read the prefix asterisk operators indicating a pointer, till the beginning of the declaration or the opening parenthesis, corresponding to the closing parenthesis of Rule 1, is reached.
Rule 3. If a type qualifier or qualifiers appear next to a type specifier ( int
, char
, float
, double
, etc.) it applies to that type-specifier. Otherwise, it applies to the asterisk pointer to its immediate left. The type qualifier restrict
only applies to pointers.
Let's apply the above rules to understand the very first declaration we talked about, i.e. int *p[4];
.
In the following illustrations, the red arrow indicates starting position and the green arrow indicates the ending position. The rule under consideration is applicable to the text between the two arrows. The purple color is used to indicate the text that has already been processed.
The first identifier in the above declaration is p
.
We apply Rule 1 and read the postfix operator (in this case square brackets indicating an array) till we reach the semicolon, " p
is an array of 4 . . .".
Since a semicolon marks the end of a declaration, we stop the application of Rule 1 and apply Rule 2. We read the prefix operator (asterisk indicating a pointer), preceded by the type specifier int
, till we reach the beginning of the declaration, " p
is an array of 4 pointers to integers."
Before moving on to other complex declarations, let's see which C declaration would correspond to the graphical representation shown below.
It shows p
to be a pointer to an array of 4 integers. Since the pointer asterisk has lower precedence than array brackets and function parentheses, we have to enclose *p
within parentheses to elevate its precedence as shown below.
int (*p)[4];
Let's start with p
and read the declaration.
We attempt to apply Rule 1 but do not find any postfix operators. We find an unmatched closing parenthesis.
We apply Rule 2 to the prefix asterisk operator (pointer) till we reach the opening parenthesis and read, " p
is a pointer to . . .".
We have read whatever we found inside the parentheses and apply Rule 1 to the part of the declaration outside the parenthesis. We find a postfix operator (in this case the square brackets indicating any array), followed by the semicolon indicating the end of the declaration, and read, " p
is a pointer to an array of 4 . . .".
We have reached the end of the declaration but still have a part of the declaration to read. We apply Rule 2 but do not see any prefix operators. Instead, we find the type specifier int
before reaching the beginning of the declaration line, and read, " p
is a pointer to an array of 4 integers."
Let us apply what we have learned to convert the following declaration into simple English.char *(*(*a)())[10];
The first identifier in the declaration is a
. This is where we start.
We find an unmatched closing parenthesis to the right of a
. We cannot apply Rule 1 as there are no postfix operators.
We apply Rule 2 to the part up to the opening parenthesis, which includes a prefix asterisk operator indicating a pointer, and read, "a
is a pointer to . . .".
We have taken care of the innermost parentheses. We apply Rule 1 to the part of the declaration up to the next unmatched closing parenthesis. The postfix operator we find is a pair of parentheses indicating a function. We read, "a
is a pointer to a function that has no parameters . . .".
We apply Rule 2 to the part of the declaration up to the opening parenthesis as shown, and read, "a
is a pointer to a function that has no parameters and returns a pointer to . . .".
We have read the part of the declaration in the outermost pair of parentheses. We apply Rule 1 to the remaining part of the declaration. We find a postfix operator (square brackets in this case) indicating an array, followed by the semicolon. We read, "a
is a pointer to a function that has no parameters and returns a pointer to an array of 10 . . .".
We reached the end of the declaration while applying Rule 1. We now apply Rule 2 to the remaining part of the declaration. We find a prefix asterisk operator indicating a pointer, preceded by the type specifier char
. This takes us to the beginning of the declaration. We read, "a
is a pointer to a function that has no parameters and returns a pointer to an array of 10 pointers to characters."
This is a complicated function pointer declaration. The following program shows how this declaration could be used in a C program.
#include <stdio.h>#include <stdlib.h>char *(*(myfunc)())[10]{char *(*p)[10];p = malloc(sizeof(char *) * 10);/* process the allocated 80-byte block as required */return (p);}int main(int argc, char *argv[], char *envp[]) {char *(*q)[10];char *(*(*a)())[10] = myfunc;printf("Size of pointer on this machine: %lu bytes\n", sizeof(char *));q = a();fprintf(stdout, "p: %p\t p+1: %p\n", q, (q+1));return (0);}
On 64-bit machines, all pointers ( char *
, char **
, char ***
, and so on) are 8-byte long. Compiling and executing the above program produces output like shown below.
Size of pointer on this machine: 8 bytes
q: 0x600001e90d20 q+1: 0x600001e90d70
We observe that even though q
is an 8-byte pointer, advancing it by 1
changes the address by 0x50
or 80
bytes. This confirms that q
is indeed a pointer to an array of 10
pointers to characters, exactly as we found by decoding it.
Let us decode one more complex C declaration, which will require applying char ** const * volatile x;
We find the first identifier in the declaration, which is x
.
There is a semicolon to the immediate right of x
hence we cannot apply x
is the type qualifier volatile
which means we have to apply volatile
applies to the asterisk (pointer) to its immediate left.
Since the type qualifier applies to the asterisk to its immediate left, we stop here temporarily and read till this point. " x
is a volatile pointer to . . .".
We find const
to the left of the constant pointer. According to x
is a volatile pointer to a constant pointer . . .".
We apply char
. We read, " x
is a volatile pointer to a constant pointer to a pointer to a character."
A variable may be initialized in the declaration. Let us consider such a declaration.int ( * cmp ) ( const void *, const void * ) = ascending ;
The identifier cmp
is our starting point.
We try to apply cmp
. We read, " cmp
is . . .".
We apply cmp
, preceded by the opening parenthesis. We read, " cmp
is a pointer to . . .".
We apply (*cmp)
indicating a function. We continue till we reach the corresponding closing parenthesis, and read, " cmp
is a pointer to a function (which has two parameters, both are pointers to constant void)".
We still haven't reached a semicolon or an unmatched parenthesis, so we continue applying =
) indicating an initializer. Let's handle it at the end.
We apply int
to the immediate left of (*cmp)
which is a type specifier. We read, " cmp
is a pointer to a function (which has two parameters, both are pointers to constant void) and returns an integer."
The initialization part of the declaration stores the value of the variable ascending
(which must be a function of the appropriate type, as mentioned in the declaration) in the identifier cmp
.
Finally, let's look at the most complicated declaration in the list given at the beginning.
struct IMAGE *(*(*(*fp)[5]))(const char *, int);
On applying the rules, we obtain the following simple English representation:
" fp
is a pointer to an array of 5 pointers to pointer to functions (whose first parameter is a pointer to a constant character and the second parameter is an integer) and returns a pointer to struct IMAGE
."
The figure below shows the sequence in which this complex declaration is handled, by numbering its various parts.
Every C declaration begins with a type specifier, such as char
, int
, double
, etc, or a type qualifier const
or volatile
. The type qualifier restrict
cannot begin a declaration as it applies to pointers only. The type specifier could be one keyword, such as int
, or multiple keywords, such as unsigned long int
, or long double
. Type specifier may have the struct
, union
, and enum
keywords.
We start with the first identifier from left, applying Rule 1 (postfix operators) till we encounter an unmatched closing parenthesis or a semicolon indicating the end of the declaration. Then we apply Rule 2 (prefix operators) till we encounter an opening parenthesis or reach the beginning of the declaration.
We alternate between
With this knowledge, we can decode any valid complex C declaration into simple English.
Please decode the following declarations for more practice. Answers are provided to verify your work.
int *(* const *b[8]) (void);char * const * (*c)(void);char *(*(*p[4])(char *))[];void (*s(int, void (*)(int)))(int);void *(*f(int))(int);char ** const * volatile x;char *(*(**f[][4])())[];
b
is array 8 of pointers to const
pointers to function which takes no parameters and returns a pointer to int
c
is a pointer to a function with no parameters returning a pointer to const
pointer to char
p
is an array 4 of pointer to functions that has a pointer to char
parameter returning a pointer to an array of pointers to char
s
is a function that takes two parameters, the first one is an int
and the second one is a pointer to a function that takes an int
and returns void,
returning a pointer to a function that has an int
parameter and returns void
f
is a function that takes an int
parameter and returns a pointer to a function that takes an int
parameter and returns a pointer to void
x
is volatile
pointer to const
pointer to pointer to char
f
is a two-dimensional array (second dimension is 4) of pointer to pointer to function, that takes on parameters, returning pointer to array of pointer to char
Browse the following courses to learn more about C programming language.
C Programming Language, 2nd Edition, Brian W. Kernighan, Dennis M. Ritchie
C: A Reference Manual, 5th Edition, Samuel Harbison, Guy Steele Jr.
Expert C Programming: Deep C Secrets, Peter van der Linden
Free Resources