Properties of Context-Free Languages

UNIT-IV

PROPERTIES OF CONTEXT-FREE LANGUAGES

Objectives:

To learn about Properties of Context Free Languages.

To learn about Normal Forms(CNF,GNF).

To learn about Pumping lemma for CFL’s.

To learn about Closure properties of CFL.

To learn about Turing Machine.

Introduction:

We simplify context-free grammars; these simplifications make iteasier to prove facts about CFL’s.

We then prove the “pumping lemma” for CFL’s.

We will prove the closure properties and the decision properties

We use the Turing machine to develop a theory of “undesirable”problems that no computer can solve.

Properties of Context Free Languages:

Let L1 and L2 be CFL’s.

(1) L1 L2 (the union) is CFL.

(2) L1L2 (the concatenation) is CFL.

(3) L1 * (the Kleene star) and L1+ (the Kleene plus) are CFL.

(4) L1R (the reversed language) is CFL.

(5) L1 L2 (the intersection) is not necessarily CFL.

(6) L1 (the complement) is not necessarily CFL.

Proof of the Context-free Language Properties (cont’ed)

LetG1 = ( VN1, VT1, P1, S1 ) and G2 = ( VN2, VT2 , P2, S2 ) be CF grammars that generate L1 and L2 , respectively. Without loss of generality, assume that VN1 and VN2 are disjoint, i.e., VN1 VN2 = . (Otherwise, we can modify them.)

(1) Construct a CFG G by merging the rules of grammars G1 and G2 and adding new rules S  S1 | S2. (This is the same technique for regular languages.)

(2) Construct a CFG G by merging the rules of G1 and G2 and adding a new rule

S  S1S2.

(3) For L1 * add rules S  S1S |  in grammar G1. For L1+ add rules S  S1S | S1 ,where S is new start symbol.

(4) Construct a CFG from G1 by changing each rule A to A R, i. e., reverse right side of each production rule.

(5) We know that L1 = {aibic j  i, j  0 }and L2 = {a k b ncn  k, n  0 }

are CFL’s. But L1 L2 = {a i b i c i  i  0 } is not CFL.

(6) Suppose that CFL’s are closed under complementation. Since CFL’s are closed under union (property (1)), and

L1 L2 = L1L2 ,

which implies CFL’s are closed under intersection. This contradicts to the proven fact of property (5).

Minimizing the Number of -Production Rules

Theorem. Given an arbitrary CFG G, we can construct a CFG G´ such that

•L(G) = L(G´) and if  is not in L(G), then G´ dose not have - production rule.

•If L(G), then S  is the only -production rule of G´.

Proof (an algorithm). Let G = (VT, VN, P, S), and let A, B  VN. We construct a CFG G´ = (VT ,VN ,P´, S) from G by the following steps.

(1) Find the set W of all nonterminals of G which derive  as follows;

W0 = {A| A VN and A  is in P};

DoWi+1 = Wi {A | A  VN and A  is in P, for some  Wi +}; until (Wi+1 = Wi);

W = Wi; //W contains all nonterminal symbols from which  can be derived.

(2) Delete all -production from P. Call this new set of productions P1.

(3) Modify P1 to P´ as follows: If a production A  is in P1, then put the rules A  and A  into P´, for all  ( ) which are obtained from  by deleting one or more nonterminals in the set W constructed by step (1).

(4) If S is in W, then add S  in P´.

Minimizing the Number of -Production Rules (example)

Pumping Lemma for CFLs

Closure Properties of Context Free Languages

Turing Machines

We once again wish to generalize the languages we can analyse. Rather than starting on the syntactic side we look at more powerful automata, namely Turing machines. These have read/write memory which allows them to access items in an arbitrary order. With these machines there is a distinction to be made between languages which are recognized (accepted) and those which are decided. We nally give an overview of how the di erent kinds of languages covered so far t together. This is the point where we lose connection with the other part of this course.

3.1Automata with random-access memory|Turing machines

Clearly the question of recognizing the language faibici j i 2 Ng (see page 40) is not in itself a di cult one. In most programming languages it would be easy to write a procedure which does this, and therefore pushdown automata are not as powerful as other computing devices. Before we go on to look at more expressive automata let us review the progression of such devices so far.

We can think of a DFA as a computing device without dedicated memory. However, such automata can remember a limited amount of information by making use of their states. We can think of a pushdown automaton as a computing device that has a dedicated unlimited memory15which can only be used in a very limited way, namely in the manner of a stack. The resulting machines are too weak even to be used to deal with programs on a level that a compiler has to do. Symbol tables are typically used to keep track of variables, and for those a random access memory of some form is required.

Turing machines were proposed as a theoretical device by Alan Turing16. Again they areidealized in that they have an in nite memory, which comes in the form of an `in nite tape' (in nite on both sides). The idea is that this tape consists of cells each of which can take one symbol from some underlying alphabet , and that there is a `read/write head' which moves over the tape one position at a time and can either read from or write to the tape. It is practical to assume that the input string is written on the tape (from left to right). When the machine starts the read/write head is positioned on the cell holding the rst symbol of the input string. We further assume that the tape contains no further symbols that is all remaining cells are empty. In order to describe what the machine does (that is, under which circumstances it moves the head one position to the left or the right, or it reads from the tape, or writes to the tape) we need to have a notion of state, just as we had with the other automata. As before there is a speci c start state, and this time we need an action stop to tell us when the machine has nished its computation.

Languages accepted by Turing machines

It is easy to describe a machine which recognizes the language of all strings consisting of as whose length is even, that is fa2n j n 2 Ng. It moves to the right until it nds a blank, switching between two states which code `odd' and `even'. It stops when it encounters a blank. The only accepting state is the one that codes for `even'. Here is the corresponding table.

a
0 / (1; a; R) / stop
1 / (0; a; R) / stop

But we can have another Turing machine which recognizes this language. Just like the rst one, it moves to the right along the input string, switching between two states which code for `odd' and `even'. If it nds a blank it will do one of two things:

it will stop if the current state is the one which is for `even';

it will keep moving to the right forever if the current state is the one for `odd'. Once again here is a transition table for such a machine.

a
0 / (1; a; R) / stop
1 / (0; a; R) / (1; / ; R)

Which of the two machines is more useful? If you cannot watch the machine work, but just observe whether or not it halts, and in which state, then clearly the second machine is not as useful as the one. You might wait and wait, but you can never be sure whether that is because the input string is very long or whether the machine has slipped into this endless `loop'.

We will distinguish these two ideas by having two definitions of the language recognized by a Turing machine.

A language is Turing-recognizable (or recursively enumerable) if thereis a Turing machine which recognizes it.

A language is Turing-decidable (or recursive) if there is a Turing machine which recognizes it which halts for all inputs.

Every Turing-decidable language is Turing-recognizable, but the converse is false. An example for a language that is Turing-recognizable but not Turing-decidable is provided in Theorem 4.1. Figure 30 shows how the two concepts relate. An example for a language which is not even Turing-recognizable is given on Page 67.

The main di erence between the two concepts is this: To show that a language is Turing-decidable, we have to nd a Turing machine which recognizes it which halts for all inputs. For Turing-recognizability it is sucient if the machine only halts for those inputs which is accepts, that is which belong to the language.

If a language is Turing-decidable then we can nd a Turing machine which, when given an input string, will give us a definite answer whether the string belongs to the language: `yes' (if it halts in an accepting state) or `no' (if it halts in a non-accepting state).

If a language is Turing-recognizable then we can nd a Turing machine which, given an input string, will give us a de nite answer only when the string does belong to the language. The problem is that we have no way of knowing how long to wait for the answer. If the language is very complicated (or the string very long), it might just take a very long time to work out that it does belong.

Summary

We eliminate the useless symbols from a CFG unless it derives some strings of terminals.

Eliminate the - no productions and unit productions.

Given a CFG find another which is in Chomsky normal form.

Pumping lemma is used to prove that a language is context free.

The Turing machine, the acceptance by a Turing machine and the recursively enumerable languages are found out.