UNIT - 6

June 2010

1) What do you understand by Definition use pair?Draw the Control flow graph for GCD?

Soln:

DU pair: a pair of definition and use for some variable, such that at least one DU path exists from the definition to the use. public class GCD { public intgcd(int x, int y) { inttmp; // A: def x, y, tmp while (y != 0) { // B: use y tmp = x % y; // C: deftmp; use x, y x = y; // D: def x; use y y = tmp; // E: def y; use tmp

}

return x; // F: use x

}

Reach(E) = ReachOut(D)

ReachOut(E) = (Reach(E) \ {yA}) È {yE}

Reach(E) = ReachOut(D)

The first (simple) equation says that values can reach E by going through D.

ReachOut(E) = (Reach(E) \ {yA}) È {yE}

This describes what happens at D. The previous definition of y at node A is overwritten. A new value of y is defined here at line E.

We can associate two data flow equations of this form (what reached here, and what this node changes) for every line in the program or node in the control flow graph.

2) Explain Data flow analysis and classic analysis?

Soln:

Data flow analysis with arrays and pointers

Arrays and pointers introduce uncertainty: Do different expressions access the same storage? a[i] same as a[k] when i = k a[i] same as b[i] when a = b (aliasing)

The uncertainty is accomodated depending to the kind of analysisAny-path: gen sets should include all potential aliases and kill set should include only what is definitely modified All-path: vice versa

One way to iterate to a fixed point solution. General idea: Initially all nodes are on the work list,and have default values -Default for “any-path” problem is the empty set, default for “allpath” problem is the set of all possibilities (union of all gen sets) While the work list is not empty- Pick any node n on work list; remove it from the list -Apply the data flow equations for that node to get new values

-If the new value is changed (from the old value at that node), then Add successors (for forward analysis) or predecessors (for backward analysis) on the work list Eventually the work list will be empty (because new computed values = old values for each node) and the algorithm stops.

This is a classic iteration to a fixed point. Perhaps the only subtlety is how we initialize values.

3) Explain in detail Dataflow testing Criteria?

Soln:

Adequacy criteria

The All DU pairs adequacy criterion requires each DU pair is exercised by at least one test case

All DU paths: Each simple (non looping) DU path is exercised by at least one test case

CDUpath = # of exercised simple DU paths

# of simple DU paths

All definitions: For each definition, there is at least one test case which exercises a DU pair containing it (Every computed value is used somewhere) Corresponding coverage fractions can also be defined

CDU pairs = # of exercised DU pairs

------

# of Du pairs

Difficult cases

x[i] = ... ; ... ; y = x[j] DU pair (only) if i==j p = &x ; ... ; *p = 99 ; ... ; q = x

*p is an alias of x

m.putFoo(...); ... ; y=n.getFoo(...); Are m and n the same object?

Do m and n share a “foo” field?

Problem of aliases: Which references are (always or sometimes) the same?

Data flow coverage with complex structures

Arrays and pointers are critical for data flow analysis Under-estimation of aliases may fail to include some DU pairs Over-estimation, on the other hand, may introduce unfeasible test obligations For testing, it may be preferrable to accept under-estimation of alias set rather than over-estimation or expensive analysis

•Controversial: In other applications (e.g., compilers), a conservative over-estimation of aliases is usually required Alias analysis may rely on external guidance or other global analysis to calculate good estimates.

•Undisciplined use of dynamic storage, pointer arithmetic, etc. may make the whole analysis infeasible

Infeasibility

The path-oriented nature of data flow analysis makes the infeasibility problem especially relevant Combinations of elements matter! Impossible to (infallibly) distinguish feasible from infeasible paths. More paths = more work to check manually. In practice, reasonable coverage is

(often, not always) achievable Number of paths is exponential in worst case, but often linear

All DU paths is more often impractical

•Not all elements of a profram are executable

•The path –oriented nature of data flow aggravates the problem

•Infeasibility creates test obligation not only for isolated unexecutable elements but also for infeasible combinations of feasible elements.

•Complex data strucutre may amplify the infeasibility problem by adding infeasible paths as a result of alias computation

Eg:x[i] alias of x[j] when i=j cant determine when i = j in prg exe.

•But problem of infeasible path modest in DU pair adequacy criteria.

December 2010

Explain Definition use pair with an example?

DU pair: a pair of definition and use for some variable, such that at least one DU path exists from the definition to the use.

public class GCD { public intgcd(int x, int y) { inttmp; // A: def x, y, tmp while (y != 0) { // B: use y tmp = x % y; // C: deftmp; use x, y x = y; // D: def x; use y y = tmp; // E: def y; use tmp

}

return x; // F: use x

}

Reach(E) = ReachOut(D)

ReachOut(E) = (Reach(E) \ {yA}) È {yE}

Reach(E) = ReachOut(D)

The first (simple) equation says that values can reach E by going through D.

ReachOut(E) = (Reach(E) \ {yA}) È {yE}

b) What is reaching Definition? Illustrate with an algorithm by applying flow equation?

Soln:

An efficient algorithm for computing reaching definitions (and several other properties) is based on the way reaching definitions at one node are related to the reaching definitions at an adjacent node. Suppose we are calculating the reaching definitions of node n, and there is an edge (p,n) from an immediate predecessor node p.

If the predecessor node p can assign a value to variable v, then the definition vp reaches n. We say the definition vp is generated at p.

If a definition vp of variable v reaches a predecessor node p, and if v is not redefined at that node

(in which case we say the vp is killed at that point), then the definition is propagated on from p to n.

public class GCD {

publicintgcd(int x, int y) {

inttmp; // A: def x, y, tmp

while (y != 0) { // B: use y

tmp = x % y; // C: deftmp; use x, y

x= y; // D: def x; use y

y= tmp; // E: def y; use tmp

}

return x; // F: use x

}

Reach(E) = ReachOut(D)

ReachOut(E) = (Reach(E) \ {yA}) È {yE}

Reach(E) = ReachOut(D)

The first (simple) equation says that values can reach E by going through D.

ReachOut(E) = (Reach(E) \ {yA}) È {yE}

This describes what happens at D. The previous definition of y at node A is overwritten. A new value of y is defined here at line E.

We can associate two data flow equations of this form (what reached here, and what this node changes) for every line in the program or node in the control flow graph.

General equation for reaching analysis

Each node of the control flow graph (or each line of the program) can be associated with set equations of the same general form, with “gen” and “kill”

For reaching definitions, the “gen” set is just the variable that gets a new value there. It could be the empty set if no variable is changed in that line.

The “kill” set is the set of values that will be replaced.

The subscripts here are the lines or nodes where the assignment takes place, so that we can keep track of different assignments to the same variable.

For example, v99 would represent a value that v received at node (or line) 99. Reach(n) = È ReachOut(m) mÎpred(n)

ReachOut(n) = (Reach(n) \ kill (n)) È gen(n) gen(n) = { vn | v is defined or modified at n } kill(n) = { vx | v is defined or modified at x, x_n }

Avail equations

Available expressions” is another classic data flow analysis. An expression becomes available where it is computed, and remains available until one of the variables in the expression is changed. Unlike “reaching definitions”, though, it is not enough for an expression to be available on ANY path to a node. Rather, it has to be available on ALL paths to the node.Avail (n) = Ç AvailOut(m) mÎpred(n)

AvailOut(n) = (Avail (n) \ kill (n)) È gen(n) gen(n) = { exp | exp is computed at n } kill(n) = { exp | exp has variables assigned at n }

•The gen and kill sets are different, reflecting the definition

•We use intersection instead of union when there is more than one path to a node, because this is an “all paths” analysis rather than an “any paths” analysis

c) Define Various Data flow testing Criteria?

Soln:

Intraprocedural :Within a single method or procedure as described so far,Interprocedural:Across several methods (and classes) or procedures Cost/Precision trade-offs for interprocedural analysis are critical, and difficult context sensitivity

A context-sensitive (interprocedural) analysis distinguishes sub() called from foo() from sub() called from bar();

July 2011

4) Describe the algorithms for available expressions classic data flow analysis

with an example using Control Flow graph?

Soln:

Avail equations

Available expressions” is another classic data flow analysis. An expression becomes available where it is computed, and remains available until one of the variables in the expression is changed. Unlike “reaching definitions”, though, it is not enough for an expression to be available on ANY path to a node. Rather, it has to be available on ALL paths to the node. Avail (n) = Ç AvailOut(m)

mÎpred(n)

AvailOut(n) = (Avail (n) \ kill (n)) È gen(n)

gen(n) = { exp | exp is computed at n }

kill(n) = { exp | exp has variables assigned at n }

•The gen and kill sets are different, reflecting the definition

•We use intersection instead of union when there is more than one path to a node, because this is an “all paths” analysis rather than an “any paths” analysis

Live variable equations

Live variables is a third, classic data flow analysis. A variable is “live” at a node if the value that is currently in the variable may be used later (before being replaced by another value). It is an any-paths property like reaching definitions Note what is different here: We look at successors to the current node, rather than predecessors. We say “live” is a backward analysis, while reach and avail were forward analyses. Live(n) = È LiveOut(m)

mÎsucc(n)

LiveOut(n) = (Live(n) \ kill (n)) È gen(n)

gen(n) = { v | v is used at n }

kill(n) = { v | v is modified at n }

classification analysis

Forward/backward: a node’s set depends on that of its predecessors/successors

Any-path/all-path: a node’s set contains a value iff it is coming from any/all of its inputs

Example: Taintedness (in web form processing)

“Taint”: a user-supplied value (e.g., from web form) that has not been validated

Gen: we get this value from an untrusted source here

Kill: we validated to make sure the value is proper

6b) Explain Data fllow testing criteria & Data flow testing coverage with Complex structures?

Adequacy criteria

The All DU pairs adequacy criterion requires each DU pair is exercised by at least one test case

All DU paths: Each simple (non looping) DU path is exercised by at least one test case

CDUpath = # of exercised simple DU paths

# of simple DU paths

All definitions: For each definition, there is at least one test case which exercises a DU pair containing it (Every computed value is used somewhere) Corresponding coverage fractions can also be defined

CDU pairs = # of exercised DU pairs

------

# of Du pairs

CDU pairs = # of exercised DU pairs

------

# of Du pairs

Difficult cases

x[i] = ... ; ... ; y = x[j] DU pair (only) if i==j

p = &x ; ... ; *p = 99 ; ... ; q = x

*p is an alias of x

m.putFoo(...); ... ; y=n.getFoo(...); Are m and n the same object?

Do m and n share a “foo” field?

Problem of aliases: Which references are (always or sometimes) the same?

Data flow coverage with complex structures

December 2011

5) Explain data flow analysis with arrays and pointers?

Soln:

Data flow analysis with arrays and pointers

Arrays and pointers introduce uncertainty: Do different expressions access the same storage?

a[i] same as a[k] when i = k

a[i] same as b[i] when a = b (aliasing)

The uncertainty is accomodated depending to the kind of analysisAny-path: gen sets should include all potential aliases and kill set should include only what is definitely modified All-path: vice versa

Scope of Data Flow Analysis

Intraprocedural :Within a single method or procedure as described so far,Interprocedural:Across several methods (and classes) or procedures Cost/Precision trade-offs for interprocedural analysis are critical, and difficult context sensitivity

A context-sensitive (interprocedural) analysis distinguishes sub() called from foo() from sub() called from bar();