Mark is the CTO of CodeSourcery. He can be contacted at mark@codesourcery.com.
Optimizing compilers try to transform source code into the most efficient machine-code equivalent. Techniques such as alias analysis and type-based alias analysis help accomplish that goal. In this article, I'll examine these techniques and explain how you can write source code that is easier for the compiler to optimize. Although C++ is often criticized as being too slow for high-performance applications, I'll show how C++ can actually enable compilers to create code that is even faster than the C equivalent.
CodeSourcery (my company) contributed an implementation of these optimization techniques to the GNU C and C++ compilers (gcc and g++, respectively). These techniques are also used by other optimizing compilers, including the MIPS compilers produced by SGI. Thus, you can take the techniques presented here and use them to improve your code today.
Aliasing: The Problem
Two pointers alias one another if they point to the same location in memory. For instance, in Example 1, ip1 and ip2 are both aliases for i. The existence of aliases makes optimization much more difficult because it is hard to tell whether a particular value is being changed. Take a look at the function in Example 2(a). If the compiler could deduce that ip1 and ip2 cannot point to the same array, then the compiler could compile this function to assembly code that looks like Example 2(b). Note that there would be only one LOAD per loop iteration, except for the very first iteration. That's because the value of ip2[i+1] from that iteration is the same as ip2[i] on the next iteration, so there's no reason to reload the value.
Unfortunately, there's no way to know, just from looking at the function definition, that ip1 and ip2 do not alias. For example, the use of f in Example 3 is completely legal, even though both arguments to f are identical.
Now, the assembly code shown in Example 2(b) is invalid because the value of ip2[i+1] is changed by the store into ip[i+1]. That happens because ip1 and ip2 are both aliases for ia. Because of this possibility, a C compiler must generate an extra LOAD instruction on each loop iteration, ending up with code like that shown in Example 2(c).
Examples like these are part of the reason that Fortran is often espoused as being faster than C. In Fortran, aliasing is illegal, so the compiler would be allowed to assume that ip1 and ip2 cannot alias. So, while the faster code in Example 2(b) would be generated by a good Fortran compiler, a C or C++ compiler must generate the slower code in Example 2(c).
Alias Analysis: The Solution
A compiler uses alias analysis to try to determine when two pointers do not alias. If the compiler can determine that two pointers are not aliases of one another, then the compiler can generate more efficient code in cases like the aforementioned example.
There are many techniques for doing alias analysis. One approach is the use of special keywords, such as restrict. The next version of C, currently called "C9X," provides this new keyword. You can use restrict to tell the compiler that one pointer does not alias another. This technique shifts the burden of deciding what might alias what from the compiler to you. The advantages to this approach are that it makes the compiler easier to implement and it lets you provide the compiler with information that no amount of automated analysis might be able to provide. CodeSourcery contributed an implementation of restrict to the GNU C and C++ compilers, so current versions of gcc and g++ support restrict. Several other commercial compilers also support restrict.
However, restrict is no panacea for the alias ailment. If you declare that two variables cannot alias, but then write code that causes them to alias, the program will not work correctly. A bug like this can take days to track down. It is cumbersome to use restrict everywhere. Because restrict is not supported by many C and C++ compilers, it is not portable. Finally, certain aspects of the way in which C9X defines restrict mean that there are situations where automated analysis can show that two things do not alias, but where there is no way to specify this with restrict.
In addition to using special keywords, such as restrict, compilers use lots of other techniques to establish when a pointer cannot alias some object. For instance, in Example 4 the compiler does not have to worry about ip aliasing j because j's address is never taken. If the address of a variable is never taken, then it's impossible for any pointer to point to it. In C++, there are ways to take the address of a variable without using the & operator. In particular, you pass the variable by reference, or call one of its member functions. But if none of those possibilities occur, then the variable is never addressed.
Since the compiler can tell that j is never addressed, it does not need to worry that g might modify j. That means that j can be kept in a callee-saved register (one that is guaranteed to be preserved by called functions); it does not need to be read from or written to memory on each iteration through the loop.
As a programmer, you can benefit from knowledge about how the compiler performs alias analysis. In particular, you can write programs that run faster if you know that aliasing issues are important. In the first version of f in Example 5, the compiler has to worry that g might modify *p. Therefore, *p will be reloaded on each pass through the loop as it passed to h. In the second version, x is never addressed, so the compiler can keep it in a register, thereby improving performance. (Of course, if g really might modify *p, you shouldn't use this trick.) The downside to programming like this is that you may make your code harder to read. It's not a good idea to use this technique everywhere; use it only where performance is really critical.
Type-Based Alias Analysis
A more complicated kind of alias analysis is type-based alias analysis. It is used in top-of-the-line compilers, including the GNU C and C++ compilers. This form of alias analysis uses the types of variables to determine what things might alias what. Although useful in C as well, type-based alias analysis is a particularly powerful weapon in C++, sometimes allowing C++ code to run faster than the obvious C equivalents.
The idea behind type-based alias analysis techniques is to use the aliasing rules in the C and C++ standards to assist the compiler. For instance, the code in the first half of Example 6 is not legal ANSI/ISO C or C++ since it treats the address of d as if it were a pointer to an int, even though it is really a pointer to a double. The technical rule in the Standards, with a few exceptions, is as follows:
"If a program attempts to access the stored value of an object through an lvalue of [a type other than the dynamic type of the object] then the behavior is undefined."
That just means that you have to use the type an object really has when accessing it. The bit about "dynamic types" means that you can use a base-class pointer to access a derived-class object.
For those that are curious, the exceptions are that you may use an lvalue (roughly speaking, a pointer or reference) to a signed type to access an object of unsigned type, or vice versa, that you can use an lvalue with different const-ness or volatile-ness than the object, and that you can use an lvalue of type char or unsigned char to access any object whatsoever. That last clause implies that the code in the second part of Example 6 is legal because the modification of d happens through a pointer of type char*. (One reason for the exception for character types is so that you can write a portable version of memcpy; you can just iterate through the object copying it character by character.)
These rules make it possible for the compiler to do additional alias analysis. The numerical code in Example 7 is representative of code that appears in many scientific applications. A few instructions here or there can make a big difference in a loop this small. There is an assignment to s->x_m[i] on each iteration of the loop. Therefore, a compiler that doesn't know about type-based alias analysis would have to reload s->a_m, s->b_m, s->x_m, and s->n_m on each iteration, for fear that the assignment to s->x_m[i] might have altered s->a_m. However, a compiler that uses type-based alias analysis can recognize that the store to s->x_m[i] is modifying a double, while s->a_m is a double*. Therefore, it would be illegal for s->x_m[i] to be at the same address as s->a_m. Consequently, there's no need to reload s->a_m or s->b_m as the program iterates through the loop. Similar considerations apply to s->n_m; it need not be reloaded either.
Concretely, the second part of Example 7 shows the code that GCC would generate for a MIPS processor, without type-based alias analysis, but with all other optimizations for this loop. (If you don't read MIPS assembly code, don't worry. The code is annotated, and you'll see the basic idea.) In this code, the register $f3 contains u_m, and $f2 contains v_m. The register $5 holds i.
The third part of Example 7 shows the code when type-based alias analysis is used. The compiler counts backward in this loop, because it knows that s->n_m cannot change. In particular, $2 starts out as s->n_m and is decremented until it reaches 0. In this version of the loop, there are only 11 instructions instead of 17, representing approximately a 35 percent speedup. Because there are fewer references to memory, the improvement could be even greater than that. The key is that the compiler was able to avoid reloading s->a_m, s->b_m, and s->x_m, and s->n_m on each loop iteration.
Because some compilers now do type-based alias analysis, you should be careful not to violate the typing rules given in the language specification. If you write code that uses an int* to modify a double, you may well find that, when you enable optimizations, your compiler does not do what you expect.
For example, the compiler might decide that it can treat the code in the first half of Example 8 as identical to the code in the second half, even though the last two lines have been reordered. Since the compiler knows that ip cannot point to a double, it does not have to worry that the assignment to *ip modifies d. Thus, the code generated won't behave as you expect. You should think of breaking the type system in this way as equivalent to using uninitialized memory, accessing data after you have passed it to free, or some other equally heinous activity.
Benefits of Type-Based Alias Analysis in C++
Type-based alias analysis can be of particular benefit to C++ programmers when container classes are in use. In C, a reusable list of abstract data types would look something like the definition in the first part of Example 9.
Using this data structure, there would be no difference (in type) between a list of ints and a list of doubles; both would be just pointers to a list_node. For example, assume that x is a list of integers and y is a list of doubles, and take a glance at the code in the second part of Example 9.
One drawback to the use of C is the ugly casts. However, there's an aliasing issue as well. The assignment to y->next changes a list_node*. So, on the third line, the compiler cannot be sure that x->next still points to the same thing it did before. Therefore, x->next will be reloaded from memory. However, since x is really a list of ints, and y a list of doubles, the programmer knows these things could never alias. Thus, the compiler will not generate code that is as good as possible, even using type-based alias analysis.
In C++, however, the situation is different. Of course, by using templates, ugly casts can be avoided. In addition, however, templates can make it possible for the compiler to generate tighter code. Assume a list_node is defined by a template such as that in Example 10. Then, the sample code using the data structure would be almost unchanged. (Just remove the casts, as in the second part of Example 10.) But, now the compiler knows that the assignment y->next is to a list_node<double>*, whereas x->next is a list_node<int>*. Therefore, there's no way that that x->next can change, and it doesn't have to be reloaded from memory. The stronger type system in C++ permits the compiler to generate better code.
Of course, you could make two list structures in C. For example, you could have a list_node_int and a list_node_double. Then, you would still reap the benefits of type-based alias analysis. However, this approach is not nearly as convenient as using templates. If you use a C++-to-C translator (such as CodeSourcery's mmCC) to compile your code, the translator will automatically generate list_node_ int and list_node_double. Then, even if you have only a C compiler available, you can have all the benefits of C++, including type-based alias analysis.
Conclusion
I've stressed here the ability of the compiler to avoid redundant loads and stores. That's probably the most important benefit of type-based alias analysis, since it reduces the instruction count in a sequence of code. There are other benefits as well, however. For example, the fact that two memory references do not alias gives the instruction scheduler more flexibility. On today's deeply pipelined machines, that translates to fewer pipeline stalls and faster code.
When writing code, you should remember that aliasing issues make it difficult for compilers to generate code that runs as fast as you might hope. In performance-critical code, you should try to help the compiler figure out what pointers cannot alias what objects. There are several ways to do that. Avoid taking the addresses of objects. Use temporary variables, rather than expressions involving pointers or references, to tell the compiler that a particular variable cannot change during a section of code. Use the type system to help make clear what can and what cannot alias.
More globally, the next time someone starts talking about the overhead of C++, and how performance-critical code should always be written in C or Fortran, remember that C++'s strong type system can allow for better optimization opportunities. Features such as templates can be used not only for type safety and readability, but also to allow the compiler to generate faster code.
DDJ