This excerpt from the IBM PowerPC Compiler Writer's Guide gives coding examples of how a compiler can avoid branches in the application code that it generates. The document is written for experienced C-language compiler developers. The complete document, written by Warthman Associates, is on IBM's Web site.
The following PowerPC instructions are used in this excerpt:
3.1.5 Avoiding Branches
Branching, both conditional and unconditional, slows most implementations. Even an unconditional branch or a correctly predicted taken branch may cause a delay if the target instruction is not in the fetch buffer or the cache. It is therefore best to use branch instructions carefully and to devise algorithms that reduce branching. Many operations that normally use branches may be performed either with fewer or no branches.
Reducing the number of branches:
22.214.171.124 Computing Predicates
Predicates utilize a boolean expression as an integer value. For example, in C:
Figure 3-15 shows how to calculate this expression using an ordinary control flow translation.
Figure 3-15. Predicate Calculation: Branching Code Sequence
You can avoid the branch by using Condition Register logical instructions, as shown in Figure 3-16. In this case, the result of the comparison is directly transferred from the Condition Register to a General-Purpose Register, from which the bit is extracted and then flipped.
Figure 3-16. Predicate Calculation: Condition Register Logical Sequence
Some implementations have delays associated with accessing the Condition Register using the mfcr instruction. An alternative that uses only fixed-point operations is shown in Figure 3-17.
Figure 3-17. Predicate Calculation: Fixed-Point Operation Code Sequence
You can generate all boolean predicates with straight-line code that does not use the Condition Register. Figure 3-18 shows arithmetic expressions that yield a sign-bit reflecting the appropriate result.
Figure 3-18. Arithmetic Expressions for Boolean Predicates
Shorter sequences exist for many of these operations. The GNU superoptimizer is a program that exhaustively searches a subset of machine instructions to find the shortest branch-free combinations that perform a specified operation. Appendix D lists the PowerPC GNU superoptimizer results for a number of functions.
126.96.36.199 Conditionally Incrementing a Value by 1
Figure 3-19 shows the C code fragment for conditionally incrementing a value by 1 and equivalent branching and non-branching assembly code sequences. For simple conditions, branch-free equivalents can be formed using computed predicates.
Figure 3-19. Conditionally Incrementing a Value by 1 Code Example
188.8.131.52 Condition Register Logical
The Condition Register logical instructions can be used to combine several branch conditions, thus reducing the number of branches. For example, Figure 3-20 shows the C code fragment for a complex conditional expression and two equivalent assembly code sequences: one that has a comparison and branch for each side of the OR, and another that uses a Condition Register logical OR to combine the results of the compare and recording operations for a single test by a conditional branch. This form may present itself in situations where a common sub-expression exists between this and other code, thus offering opportunities for multiple compare and recording instructions within a single basic block.
Figure 3-20. Complex Condition Register Logical Code Example
Figure 3-21 shows code sequences for a C switch for which the optimal implementation of the multi-way branch is simply a sequence of compare-branch tests. Because the tests all have a common target address, they can be combined using Condition Register logical instructions, reducing the total number of branches from four to one. For a specific implementation, compare the timing of sequences using Condition Register logical instructions to the equivalent multiple-branch sequences because the relative performance may vary.
Figure 3-21. C Switch: Condition Register Logical Code Example