Context-Free Grammars and Languages
Derivations and Derivation Trees
A recursive definition of derivation trees from XN in a CFG G=(N,T,P,S)
(also called syntax trees) The yields of the trees will be defined at the same time.
1)Basis: X is a d.t. (a tree with a single node whose label is X) from X in G. The corresponding derivation in G is X (a derivation of length 0). X is the yield of the tree.
2)Inductive step: If X is a d.t. from X in G, where
Y
i) X is the label of the root,
ii) Y is the label of a leaf,
iii) the string obtained by taking the labels of the leaves from left to right is Y, i.e.,
the yield of the tree is Y,
iv) the corresponding derivation is X*Y, and
v) YY1…Yk is a production in P with each Yi in NT ,
then X is a d.t. from X in G, and a corresponding derivation is
Y
…
Y1 Yk
X*YY1…Yk. Y1…Yk is the yield of the resulting tree.
In particular, in case of k=0, i.e., Y1…Yk=, then X is a d.t. from X in G,
Y
and a corresponding derivation is X*Y. is the yield of the resulting tree.
Examples
Leftmost Derivations and Ambiguity
Let G=(N,T,P,S) be a CFG.
01…n is called a leftmost derivation if for each i, i=xiXii and i+1=xiii for some xi in T*, i in V*, and Xii in P. For leftmost derivations, we sometimes use lm instead of . Rightmost derivations are defined similarly.
THEOREM Let G be a CFG and X be a nonterminal. Then
(1)X* holds in G iff there exists a d.t. from X in G whose yield is .
(2)There is a 1:1 correspondence between the leftmost derivations in G and the d.t.’s in G.
G is said to be ambiguous if there exists x in L(G) that is generated by two distinct leftmost derivations from S, in other words, if there are two different derivation trees for x.
Examples
- Identifiers: Gid=(Nid,Tid,Pid,id) Unambiguous
Nid={id,A,L,N}, Tid={a,…,z,A,…Z}{0,1,…9},
Pid={idL|idA, AL|N, La|b|…|z|A|…|Z, N0|1|…|9}.
id idlmid Almid AAlmL AA
id A lmAAAlmALAlmAbA
id A N lmAbNlmAb1
L L 1
A b
- Arithmetic expressions: Gexp1=({E},{x,y,z},Pexp1,E)
Pexp1={Ex|y|z|E+E|E-E|E*E|E/E}
Ambiguous There are two different derivation trees for x*y+z:
E E
E E E E
E E E E
x * y + z x * y + z
(x*y)+z x*(y+z)
CFG s= BNFs
BNF (Backus-Naur form or Backus’ notation)
Expressions to describe the syntax of programming languages
… metasymbol to represent grammatical element …
::= is defined to be of the form
either or
Examples
1. Identifiers
<identifier>::=<letter>|<identifier<alphanumeral>
<alphanumeral>::=<letter>|<numeral>
<letter>::=a|b|…|z|A|B|…|Z
<numeral>::=0|1|…|9
2. Arith. Exps. (an ambiguous grammar)
<arith exp 1>::=x|y|z|<arith exp 1>+<arith exp 1>|
<arith exp 1><arith exp 1>|<arith exp 1>*<arith exp 1>|
<arith exp 1>/<arith exp 1>
3. Another definition of arith. exps. (unambiguous one!)
<arith exp 2>::=<arith exp 2>+<term>|<arit exp 2><term>|<term>
<term>::=<term>*<factor>|<term>/<factor>|<factor>
<factor>::=<factor>|(<arith exp 2>)|<variable>|<constant>
<variable>::=<identifier>
<constant>::=<numeral>
4. Statements in a Pascal-like language
<statement>::=<for statement>|<if statement>|<while statement>|…
for statement>::=for <variable><arith exp 2> to <arith exp 2> do <statement>
if statement>::=if <condition> then <statement>|
if <condition> then <statement> else <statement>
The dangling else problem
Which does “if p thenif q then a else b” mean,
“if p then begin if q then a else b end” or “if p then begin if q then a end else b” ?
The Equivalence of BNFs and CFGs
metavariables nonterminals
::=
basic symbols terminals
Exercises
- Define the following in terms of BNFs:
a)“Numbers” in language C
b)“while statements” in language C
c)Language AnBn
d)Even integers as strings over alphabet {0,1,…,9}
e)
f)
- Draw the syntax trees for the following:
a)Two different syntax trees for “if p then if q then a else b” according to the definition of if statements in the Pascal-like language given above.
b)Arith. exps. x+y*z/x, x+y*z, etc.
- Show an equivalent CFG for each of the following BNFs:
a)
b)
- Prove the following:
a)If L and L’ are CFLs, then so is LL’. In fact, ℒ2 is closed under , , *, R, substitution, homomorphism, intersection with regular sets, and so on.
b)
c)
- Give CFGs for the following languages:
a)
b)
The Pumping Lemma for CFLs
A CFG G=(N,T,P,S) is in Chomsky normal form if every production in P is either of the form XYZ or of the form Xa, where X,Y,Z are in N and a in T.
Lemma For each context-free language L there exists a CFG in Chomsky normal form generating L{}.
THEOREM For any CFL L, there exists an integer n satisfying the following condition: If zL and |z|n, then there exist u, v, w, and x which satisfy the following properties:
1)x=uvwxy,
2)|vwx|n,
3)vx,
4)uviwxiyL for any integer i0.
Proof. Let G=(N,T,P,S) be a CFG in Chomsky normal form generating L{}. Let k=|N|, the number of nonterminals of G, and put n=2k+1.
Assume that zL and |z|n and consider any derivation tree, say T, in G from S generating z. T is a binary tree, since G is in Chomsky normal form. Thus the height of T is at least k+1, which means that there is a path, say p, from the root to a leaf whose length k+1. Then p must have two identical nonterminals, say X, on it:
Choose a pair of X’s such that the dotted segment of path p contains no identical nonterminals except for the two X’s. Then properties 1) to 4) hold.
u v w x y
Applications of the Pumping Lemma
- AnBnCn={anbncn | n0} is not a CFL.
Assume the contrary. Then there exists an integer n satisfying the P.L.
Consider z=anbncnAnBnCn. Since |z|=3nn, z can be decomposed into z=uvwxy such that properties 1) to 4) of the P.L. hold. Since |vwx|n, vx contains at most two letters from a,b,c, that is, vxa*b*b*c*.
Case 1: If vxa+b* (note that vx) then #a(uv2wx2y)>#c(uv2wx2y), which contradicts the fact that uv2wx2y is in AnBnCn (Property 4)). A similar contradiction can be derived from any one of the following possible assumptions: vxa*b+, vxb+c*, vxb*c+.
Case2: If vxa+b+, then uv2wx2y is not of the form anbncn, a contradiction to property 4). Similar contradictions can be derived from vxb+c+.
In any case we obatined a contradiction. Thus AnBnCn is not a CFL.
{aibjaibj | i,j0} can be shown not to be CF in a similar way.
- {ap | p is prime} is not a CFL.
Assume the contrary and consider z=an for the n in the P.L. For z=vuwxy,
|uviwxiy|=n(1+|vx|) for i=n+1, a contradiction, since n(1+|vx|) is not prime (note that vx).
- L={aibj | j=i2} is not CF.
If it were, there exists n satisfying the P.L. Consider z=anbmL, where m=n2. Consider any decomposition z=uvwxy satisfying the P.L. v cannot contain both a and b; because, otherwise we have a contradiction that uv2wx2y is not of the form aibj. A similar argument shows that x does not contain both a and b.
Case 1: If vwx is in a+, then uv2wx2y=an+|vx|bm and this word is in L by the P.L. However m=n2(n+|vx|)2, since |vx|1, a contradiction.
Case 2: If vwx is in b+, then uv2wx2y=anbm+|vx|and it is in L. However, m+|vx|=n2+|vx|n2, a contradiction.
Case 3: If v is in a+ and x in b+, then z’=uvi+1wxi+1y=an+i|v|bm+i|x| is in L for any i. Note that |v|>0 and |x|>0. Taking i=|x|, we have (n+i|v|)2=(n+|x||v|)2=n2+2n|x||v|+|x|2|v|2n2+|x|2=m+i|x|, a contradiction.
1
#007