Interprocedural Analysis

Itamar Kahn

Modularity Issues

Almost all the optimizations we have considered until now have been intraprocedural. Those optimizations were applied within the procedure, without any regard to the procedures it calls or the calling context in which the procedure is used. Modularity, i.e., the use of smaller procedures, sometimes procedures that implement general algorithms and more and more the use of procedures as a black box with interface (object oriented programming). A problem with modularity is that it results in inhibition of optimizations. For example: a general algorithm implemented in a procedure, one must assume that a procedure's caller provides arbitrary values as parameters.

How will we achieve performance of single procedure in complex software?

Interprocedural optimizations are the general approach we will use in order to achieve better results. We give to main solutions. The first solution will be given by procedure integration, inlining and tail call elimination. All use the fact that rather than calling a procedure (an expensive move) you can simply integrate the code of it and thus avoid paying the time/space cost of procedure calling. The second solution is interprocedural analysis.

Interprocedural Optimizations

Interprocedural optimizations can be utilized for procedure integration (an exapmle is give below), constant propagation as a mean for optimizing procedure bodies and procedures cloning, transforming call-by-value parameters to call-by-reference, improving register allocation and improve intraprocedural information.

Procedure Integration

char * Red = “red”;

char * Yellow = “yellow”;

char * Orange = “orange”;

char * color(FRUIT CurrentFruit);

{ switch (currentFruit->variety) {

break;

case BANANA: return Yellow;

break;

case ORANGE: return Orange; }}

main()

{ FRUIT snack;

snack.variety = APPLE;

snack.shape = ROUND;

printf(“%s\n”, color(&snack));}

After performing procedure integration on the above we will insert the color() procedure into the main:

char * Red = “red”;

char * Yellow = “yellow”;

char * Orange = “orange”;

main()

{ FRUIT snack;

VARIETY t1; SHAPE t2; COLOR t3;

t1 = APPLE;

t2 = ROUND;

switch (t1) {

case APPLE: t3= Red;

break;

case BANANA: t3=Yellow;

break;

case ORANGE: t3=Orange; }}

printf(“%s\n”, t3);}

Pascal Example with value parameters

type vector = array[1…1000] of integer

procedure p(v: vector);

procedure q;

var a: vector;

p(a);

In this case we can avoid copying the vector for each call to procedure p (in case we don’t change its values).

C Example for Constant Propagation

int g;

p() {

…

}

q(){

…

g=100;

p();

y = g;

In this case we will be able to know whether we can propagate g value after executing procedure p.

Challenges in Interprocedural Analysis

Approaching the interprocedural analysis issue raises many obstacles:

1. Handling recursion

2. Parameter passing mechanisms

3. Virtual Methods/ Function Pointers/ Procedural Parameters/ Higher Order Functions

For the above we already have good solutions.

4. Scalability

5. Supporting separate compilation mode

Which we don’t have good solutions until now.

The Call Graph

When addressing the problem of interprocedural control-flow analysis we will wish to construct the program’s Call Graph (A graph with multiple directed edges from one node to another or multiple labels on some edges, i.e. a multigraph). The graph nodes will be its procedures and edges will be the call sites (edge or label per call site). We can build the call graph incrementally in order to support separate compilation mode.

An example for a program skeleton, its call graph and partial call graph (from separate compilation of its parts) is given below (Figure 19.1, 19.2 and 19.4 in Chapter 19):

1. procedure f()

2. begin

3. call g()

4. call g()

5. call h()

6. end

7. procedure g()

8. begin

9. call h()

10. call I()

11. end

12. procedure h()

13. begin

14. procedure I()

15. procedure j()

16. begin

17. end

18. begin

19. call g()

20. call j()

21. end

19.1 19.2 19.4

Obstacles

First of all, we have a difficulty to construct the call graph in the presence of virtual function and function pointers, for which we may not be able to determine unequivocally which edges/labels to add.

An additional obstacle is presented when dealing with procedure-valued variables. The construction of a call graph for recursive program with procedure variables has been shown to be PSPACE-hard, i.e. can be expensive in both space and time. Additional obstacles are presented when addressing either order functions and virtual methods.

The solutions for those obstacles are given by applying conservative approximation, obtaining data flow information and specialization.

Interprocedural Data-Flow Analysis

Interprocedural data-flow analysis objective is determining a useful (conservative) approximation of how a program manipulates data at the level of its call graph. This information consists of what variables might be modified as side effects of a procedure call and what parameters a called procedure have constant values when it is called from a specific call site. This information is important when optimizing around a call to a procedure. For example: knowing which variables might be affected by a called procedure, knowing that some parameters have particular constant values (procedure cloning) and in driving procedure integration decisions.

Flow-Insensitive Side-Effect Analysis

The goal of interprocedural side-effect analysis is to determine for each call site, a safe approximation of the side effects that the procedure invoked at that call site may have.

The characterization of side effects is made by the following:

· DEF – the set of variable that are definitely assigned values

· USE – the set of variables that may be referenced

· MOD – the set of variables that may be modified

These functions are easy to compute for individual instructions that do not involve procedure calls (as long as we disregard aliasing, which we can solve by flow-sensitive analysis). Cooper & Kennedy showed that we can it efficiently for programs with small number.

We will note that aliasing complicates interprocedural analysis as it does intraprocdural problems.

We will apply MOD and USE on a simple program:

Flow Sensitive Data-Flow Information

The computation of flow-sensitive side effects was shown to be co-NP-complete by Myers, as long as aliasing is taken into account. He also introduced the program summary graph (supergraph), which is an integration of control flow-graph call graph. We should note if we work with call-by-reference parameter passing, even bit vector problems are becoming hard (NP-hard). Two solutions are call strings and functional.

Non trivial constants:

int x

void p(int a) {

int c;

scanf(“%\d”, &c);

if (c > 0) {

a = a -2;

p(a);

a := a +2; }

x := -2 * a + 5

printf(“%s\n”, x);}

void main() {

p(7);

printf(“%s\n”, x) ; }

For this example we can see that if the procedure p(…) does not change the parameter a, its value will be the same for the assignment of x. In the case where c is negative, of course the value of a won't change. This situation stresses both the need for interprocedural information and the ability to analyze its side effects.

Conclusions

We have seen the importance of interprocedural analysis for the optimization of programs. The fact that modularity and its main implementation (object oriented programming) is becoming main trend in programming languages will lead to the integration of it into compilers. The analysis can be performed in link time (perform it only when you have the complete information about the program). We saw that flow sensitive and flow in-sensitive data information can improve our knowledge of the program structure, although scaling become a problem when performing flow insensitive analysis.

In conclusion, interprocedural analysis leads to simpler programming, both in the sense of supporting modularity optimization and in releasing the programmer from concerns about the final translation of its code into efficient machine language.