Pointer aliasing
In computer programming, aliasing refers to the situation where the same memory location can be accessed using different names.
For instance, if a function takes two pointers A
and B
which have the same value, then the name A[0]
aliases the name B[0]
. In this case we say the pointers A
and B
alias each other. Note that the concept of pointer aliasing is not very well-defined – two pointers A
and B
may or may not alias each other, depending on what operations are performed in the function using A
and B
.
Aliasing and re-ordering
Aliasing introduces strong constraints on program execution order. If two write accesses which alias occur in sequence in a program text, they must occur in sequence in the final machine code. Re-ordering the accesses will produce an incorrect program result (in the general case). The same is true for a write access and a read access.
However, if two read accesses which alias occur in sequence in a program text, they need not occur in the same sequence in the machine code - aliasing read accesses are safe to re-order. This is because the underlying data is not changed by the read operation, so the same value is always read.
It is vitally important that a compiler can detect which accesses may alias each other, so that re-ordering optimizations can be performed correctly.
Several strategies are possible:
In C or C++, as mandated by the strict aliasing rule, pointer arguments in a function are assumed not to alias if they point to fundamentally different types, except for char*
and void*
, which may alias to any other type. Some compilers allow the strict aliasing rule to be turned off, so that any pointer argument may alias any other pointer arguments. In this case, the compiler must assume that any accesses through these pointers can alias. This can prevent some optimizations.
In C99, the restrict
keyword was added, which specifies that a pointer argument does not alias any other pointer argument.
In Fortran, procedure arguments and other variables may not alias each other (unless they are pointers or have the target attribute), and the compiler assumes they do not. This enables excellent optimization, and is one major reason for Fortran's reputation as a fast language. (Note that aliasing may still occur within a Fortran function. For instance, if A
is an array and i
and j
are indices which happen to have the same value, then A[i]
and A[j]
are two different names for the same memory location. Fortunately, since the base array must have the same name, index analysis can be done to determine cases where A[i]
and A[j]
cannot alias.)
In pure functional languages, function arguments may usually alias each other, but all pointers are read-only. Thus, no alias analysis needs to be done.
Aliasing and multi-threading
If multiple threads have names which alias the same memory location, two problems arise.
First, unless the memory location is protected in some way, then read-modify-write operations which are non-atomic might be interleaved with similar operations on another thread, producing incorrect results. In this case, some form of synchronization primitive must be used (e.g. a critical section).
Second, even with synchronization, if two aliasing accesses occur in different threads in a non-ordered way, the program behaviour may become non-deterministic - different runs of the same program on the same data may produce different results. Thus, it is often important to impose an explicit ordering between threads which have aliases.