Automata and Formal Languages

Automata and Formal Languages

... Cool Stuff ...

Mathematical Preliminaries

CS 373 –1/27/03, 9/12/03 – NMG

Sets

A set is a collection of elements without any structure except membership.

Notation:

S1 = {a, b, c} ... A finite set

S2 = {i: i > 0, i is odd} ... An infinite set of odd integers

S2 = {1, 3, 5, ...}

|S| is the size of a set = Number of elements in the set

Membership:

b  S1,4  S2

 is the empty set (no members)

U is the universal set = set of all possible elements

Set Operations

Union:S1  S2 = {x: x  S1 OR x  S2}

Intersection:S1  S2 = {x: x  S1 AND x  S2}

Difference:S1 - S2 = {x: x  S1 AND x  S2}

Ex: U = {a, b, c}, {a, b} – {b, c} = {a} … note: S1 - S2 = S1  (S2)’

Complement:S’ = {x: x U, x  S} ... book uses

Properties of :

S  = S -  = S

S  = 

’ = U

S’’ = S

CS 373 Fall 2003 Chapter 1 Page 1

Subsets:

S1 Sx S1, x  S

Further property of  :  S, S

The definition is satisfied vacuously, since there are no elements in .
... all people in an empty room of a building are in the building!

Proper Subset:

S1 Sx  S1, x  S, AND y  S such that y  S1

Disjoint Sets:

S1 and S2 are disjoint if, and only if, S1 S2 = 

Powerset:

The set (or collection) of all subsets of a set S is called the powerset of S and is denoted by 2S

Example:

If S = {a, b, c}

then

2S = {, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c} }

|S| = 3 and |2S| = 8

In general: |2S| = 2|S|

Note that although |2S| increases exponentially with the size of finite S, the powerset is always finite. Later we shall explore what happens for a countably infinite set S.

Demonstration by Binary Counting

Given S = {a,b,c} and Si  S , i = 0, 1, 2, ..., 7. Assign ‘1’ to x  Si otherwise assign ‘0’:

a b c

0 0 0 => = S0

0 0 1 => {c} = S1

0 1 0 => {b} = S2

0 1 1 => {b, c}

1 0 0 => {a}

1 0 1 => {a, c}

1 1 0 => {a, b}

1 1 1 => {a, b, c} = S7

There are 2N entries in an N variable truth table.

2N = 2|S| = |2S|

CS 373 Fall 2003 Chapter 1 Page 1

Cartesian Products:

Given two sets S1 and S2. The Cartesian Product ofS1 and S2 , S = S1 x S2, is the set of all ordered pairs, where the first element of each ordered pair is a member of S1 and the second element of ordered each pair is a member of S2:

S = S1 x S2 = {(x, y) : x  S1 and y  S2 }

Example 1.2 from text, p. 5:

S1 = {2, 4}, and S2 = {2, 3, 5, 6}

S1xS2 = { (2,2), (2,3), (2,5), (2,6),

(4,2), (4,3), (4,5), (4,6) }

Functions and Relations:

A function f maps elements from a set called a domain to unique elements in a set called the range:

f: S1 => S2

ex:S1 = S2 = S = reals, domain D = [0,2]  S

f(x) = x2  range R= [0,4]  S

What about sin(x) on the reals, and sin-1(x) on [-1,1]?

Domain is a subset of S1 and range is a subset of S2

If the domain is all of S1, then f is a total function, otherwise it is a partial function.

Q: How would you modify a partial function to make it total and still retain the original defined mappings? Give two ways. … reduce domain or expand range.

We may represent a function as a set of ordered pairs:

{(x1,y1), (x2,y2), ... }

where each xi is an element of the domain, and each yi is an element of the range.

Note:The uniqueness property in the definition guarantees that each xi can occur at most once as the first element of a pair - the mapping must not be one to many - although it may be many to one (ex: the constant function).

A mapping which does not satisfy the uniqueness property is called a relation - Example p. 7, equivalence relation.

Equivalence Relation - example:

x  y iff x mod 3 = y mod 3, ie., r(x) = y

r(x) = y (x  y) equivalence relation since:

x  x, since x mod 3 = x mod 3

x  y => y  x, Since x mod 3 = y mod 3 ==> y mod 3 = x mod 3

x  y and y  z ==> x  z, since the same holds for the mod 3 definitions

Relation mappings (ordered pairs):

0 => 0, 3, 6, 9, 12, ...

1 => 1, 4, 7, 10, 13, ...

2 => 2, 5, 8, 11, 14, ...

3 => 0, 3, 6, 9, 12, ...

4 => 1, 4, 7, 10, 13, ...

5 => 2, 5, 8, 11, 14, ...

r(x) is a multivalued mapping, hence a relation rather than a function

r(2) = 2, 5, 8, 11, 14, ...

CS 373 Fall 2003 Chapter 1 Page 1

Equivalence classes:

For a given equivalence relation, an equivalence class is the set of all elements from the domain set which are mutually equivalent. r(x) has three equivalence classes:

{0, 3, 6, 9, 12, ...}  0

{1, 4, 7, 10, 13, ...}  1

{2, 5, 8, 11, 14, ...}  2

Graphs

Consists of two finite sets:

Set V = {v1, v2, ..., vn} of “vertices”

and

Set E = {e1, e2, ..., em} of “edges”.

Each edge ei is a pair of vertices from V:

ei = (vj,vk)

Directed Graph:

Edge has implied direction from vj to vk

CS 373 Fall 2003 Chapter 1 Page 1

Example:

Walk:A sequence of “connected” (has common vertex) edges

… if an edge is repeated then 2 vertices will be repeated

Path:Walk in which no edge is repeated

Simple Path:Walk in which no vertex is repeated – all simple paths are paths

Cycle (base vi): A walk from vi to itself with no repeated edges

Or a path from vi to itself ... a non-simple path is possible.

Loop: an edge from a vertex to itself

Algorithm for finding all simple paths between two vertices. - see pp. 7-8

Trees (p. 8)

A directed graph with no cycles has one distinct vertex called the root, such that there is exactly one path from the root to every other vertex.

==>Root has no incoming edges.

some [PC1]vertices with no outgoing edges

==> called leaves of the tree.

If there is an edge from vi to vj

thenvi is the parent

vj is the child

The level associated with each vertex is the number of edges in the path from root to the vertex (level of root = 0).

The height of a tree is the largest level number of any vertex.

Ordered tree: Order the nodes at each level.

Two examples:

Given:

Example 1: A path but not a simple path: (1,2), (2,3), (3,4), (4,3), (3,5)

Example 2: A walk which is not a path:

has an edge which repeats (definition)
necessarily has at least two vertices repeating

(1,2), (2,3), (3,5), (5,2), (2,3)

Example of a tree:

CS 373 Fall 2003 Chapter 1 Page 1

Proof Techniques

Deductive proofs – “most common” - prove a premise by using axioms and previously proven results and the rules of classical logic. ... will be used but not discussed on the assumption that everyone is familiar with the technique.

Proof by Induction (p. 9)

Want to prove a sequence of statements:P1, P2, ..., Pn for all n  1

Two conceptual steps:

For some k  1

we know that P1, P2, ..., Pk are true.

[This is called the basis.]

If we can show that for any n  k

“IF P1, P2, ..., Pn is true, then Pn+1 is true”

ie., “Pn true implies that Pn+1 is true”

then Pn is true for all n.

Note: “Assuming that P1, P2, ..., Pn is true, nk” is the inductive assumption.

Remember, P1, P2, ..., Pk are known to be true.]

The mechanics of showing that if Pn is true then Pn+1 is true is called the inductive step.

See examples on p. 10.

Summary of Proof by Induction

P1, P2, ..., Pk| Pk+1, ..., P.n.| P.n+1

known to| if this is|

be true| true,| then this is true (must show)

Then it follows that Pn. is true for all n

Note:

If we can simply show that “if Pn (n  k) is true ==> Pn+1 true for any n”

then, obviously, if Pk+1 true ==> Pk+2 true ==> Pn ==> Pn+1

Notation: “==>” means “implies”

See examples 1.5 and 1.6 in the text - details given in class lecture, but not included here.

CS 373 Fall 2003 Chapter 1 Page 1

PROOF BY CONTRADICTION

Try this when a straight forward deductive proof or induction is not working.

Approach: Prove that the statement P is true

-Assume that P is false

-If this assumption leads to the conclusion that we know is incorrect, then we can conclude that our starting assumption is false, therefore P is true

Example 1.7, p. 11-12:

P: prove that 2 is not rational.

Assume the contrary: 2 = n/m (rational)

where n and m are integers, with no common factor.

Deducing from this “contrary” premise we have:

2m2 = n2 ==> n2 is even and n is even ==> n = 2k, k an integer.

Thus

2m2 = 4k2 ==> m2 = 2k2 ==> m2 is even ==> m is even

... note: even x even = even, odd x odd = odd ...

thus

both n and m are even and thus have a common factor

CONTRADICTION

The assumption that 2 is rational is not true

And we thus conclude that 2 is not rational.

QED

CS 373 Fall 2003 Chapter 1 Page 1

Languages and Automata

Concepts and Definitions

... A formalization of natural languages

Alphabet

A non-empty set of symbols, traditionally called 

Strings

Given an alphabet 

A string (over ) is a finite sequence of symbols from 

Example: = {a, b}

String w = aaabba

Concatenation of strings w and v

Append symbols of v to right end of w

Example:w = a1a2...an

v = b1b2...bm

Concatenation wv = a1a2a3...anb1b2b3...bm

Reverse of string w is

wR = anan-1...a2a1

Length of a string w

|w| = number of symbols in w

More formally,|a| = 1 for a 

|wa| = |w| + 1for w *

... note this our first example of a recursive or inductive definition

Empty string

|| = 0

w = w = wfor any w

Substring of w

Any string of consecutive characters from w.

If w = vu

then substrings v and u are called the prefix and suffix of w, respectively.

Note:|uv| = |u| + |v|See p. 16 for proof.

CS 373 Fall 2003 Chapter 1 Page 1

If w is a string,

then wn is a string obtained by repeating w n times.

By definition, w0 = 

Sets of Strings

If  is an alphabet,

then * is the set of strings obtained by concatenating zero or more symbols from .

* always contains 

Definition:+ = * - [+ is * with  removed.]

Note: is finite, but

* and + are infinite

Language over 

L is any subset of *

A sentence in a language L is any string in L

Example: = {a, b}

* = {, a, b, aa, ab, ba, bb, aaa, aab, ... }

L = {a, aa, aab} is a finite language on 

L = {anbn: n  0} is also a language (an infinite language)

Complement of Language L

L’= * - L

Reverse of a Language L

LR = {wR : w  L}

Concatenation of two languages: L1 and L2

L1L2 = {xy: x  L1, y  L2}

(all possible x and y)

Lndefined as concatenating L with itself n times

ie., Ln = LLL ...L (n times) = (...(((LL)L)L)...)L use associativity of concatenation

Definition:L0 = { Using brackets “}”, “{“ are important here ... why?

L1 = L

Properties of Concatenation of Languages:

Example:

Star-closure of language L *

L* = L0  L1  L2 ... Remember that L0 = {

... set of all strings by concatenating strings from L ... strings are “building blocks” rather than elements from 

Positive Closure

L+ = L1 L2 ... = L* - L0

Note:* the set of all strings on  can be interpreted as star closure if  (the alphabet) is interpreted as the language consisting of all strings of length one (distinguish between a symbol in , say a, and a string of length one, a. With this interpretation we may write:

* = 0 1 2 .....

Where for example, 2 is the set of all strings of length two over 

* = {} (all strings of length 1)  (all strings of length 2)  (All strings of length 3)  ...

Example:  = {a, b}

3 = ()  = {a, b}{a, b}{a, b} ... juxtaposition of {a,b} means concatenation

= {aa, ab, ba, bb}{a, b}

= {aaa, aba, baa, bba,

aab, abb, bab, bbb}

Note: A similiar interpretation may be applied to L* where the single element string members of  are replaced by strings of any lengths – see above.

Example:

CS 373 Fall 2003 Chapter 1 Page 1

Grammars

Describes the “structure” of a language. (See p. 19.)

Grammar G

G is a quadruple

G = (V, T, S, P)

where

V is a finite set of variables

T is a finite set of terminal symbols 

S  V is a special start variable

P is a finite set of productions

V and T are non-empty and disjoint

V must minimally contain S

Production Rule

A transformation of the form x  y

x  (V  T)+strings of variables and terminal symbols, excluding .

y  (V  T)*strings of variables, including .

Notes:If x T, then x  x, necessarily.

Applications of productions to a string *

If w = uxv, where u, x, v  (V  T)+

and the production x  y is applied, where y  (V  T)*

then the new string

z = uyv is produced.

We represent this by

w  z... “w derives z”

Successive strings are derived by applying the productions in arbitrary order.

A production can be used wherever it is applicable and as often as desired.

CS 373 Fall 2003 Chapter 1 Page 1

If w1w2 ... wnwn derived from w1

then w1* wnwn derived from w1

Thus w * w (including zero steps)

Language Generated by a Grammar

Let G = (V, T, S, P) be a grammar

ThenL(G) = { w T* : S * w }[note star-closure of T, not V]

is the language generated by G

L(G) = set of all strings in T* such that w is derived from S.

For G to be a grammar for L1, the following must be satisfied:
w  L1, w  L(G)
and
w  L(G), w  L1

To prove that G does not represent L1, look for counterexamples.

Recursive vs. Non-recursive productions:

A production is nonrecursive if the right side contains no variables from the left:

Ex: S  Ab

A production is recursive if a variable on the left side appears on the right

Ex: A aAb

where in both cases S and A are variables, and a and b are terminals.

Sentential forms vs. Sentences

in derivations we may encounter:

-a sentence is a string of terminal symbols - a sentence will terminate a derivation this is what the goal is - at least striving for some partiocular sentence.

-a sentential form is a string containing a mix of variables and terminal symbols or all variables. This is an intermediate form in doing a derivation.

Examples on pp. 21-23.

CS 373 Fall 2003 Chapter 1 Page 1

Automata (General Definition)

Abstract model of a digital computer

“RO”Read-only input - 1 symbol per cell

reads left to right, 1 symbol at a time detects EOF

“RW”Read-write temporary storage - cells hold symbol from alphabet

not necessary that input is from alphabet

A logical definition: The input and storage media may be combined as in a Turing Machine, separated as in a “Push Down Automata”, or the storage being absent as in a Deterministic/Nondeterministic Automata.

Special Case: An accepter - having no temporary storage

the only stored information is the state you are in

Y/N output (accept or non-accept the string)

transition function

next state = (current state, input, state of temporary storage)

Configuration - state of {control unit, input file, temporary storage}

Deterministic vs. non-deterministic (p. 26)

Transducer - output is a string

Move:transition of automaton from one configuration to next

CS 373 Fall 2003 Chapter 1 Page 1

[PC1]1