Olympia Louskou-Bozapalidou,George Rahonis 4

Immune-based transformations [1]

Symeon Bozapalidis[2], Antonis Kalampakas[3],

Olympia Louskou-Bozapalidou,George Rahonis[4]

Department of Mathematics

Aristotle University of Thessaloniki

54006 Thessaloniki

Greece

Abstract

We introduce two models of transducers, the former admitting as input words, and the later trees. With these machines, we try to describe formally, some of the most important operations of the human immune system. Moreover, the transducers will serve as models of intrusion detection systems.

Introduction

The known computer security systems are based on the idea that using an appropriate name and a password recognize each user. But the software technology progress admits nowadays to the computer system intruders to use suitable programs in order “to break” the security laws. On the other hand, the virus detection problem is very crucial, since all the anti-virus programs can recognize only known viruses, but they are not effective for unknown ones. This practically means that any anti-virus program cannot detect a new virus, until the program will be updated so that it can recognize it. Clearly, it would be very suitable, if a computer security system could recognize any foreign intruder as well as if it could detect any new virus. But this is exactly the ability of the human’s (or more general the vertebrates’) immune system. Indeed the human’s immune system is an intelligent protective mechanism that can recognize any foreign intruder called antigen even if it had not met this antigen before. Moreover, it kills this antigen producing appropriate antibodies and learns to detect the same antigen in a more fast way the next time it will meet it in the future. This happens since the immune system produces large amounts of special antibodies for the foreign intruder, the killers and the memory antibodies. A point which shows how sophisticated is the immune system is that the antibodies and the antigens have the same structural units which are twenty amino-acids. Thus the human’s security system in fact has the ability to discriminate between the self and the non-self cells.

The last years the computer scientists try to apply the operating strategy of the human’s immune system to the computer security systems. Many models have been constructed but still they do not seem suitable enough to be applied in practice. The reader will find much of this work from the group of Prof. St. Forrest but also other references at [2]. Still in this very promising research topic there is not a theory background that could lead to good applications. In this paper, we try a first approach to this problem from the viewpoint of Theoretical Computer Science. We construct two transducers in order to simulate the most important operations of the human’s immune system. The first one works on strings, which regarded to be the antibodies, the antigens, and the bound complexes, which are, bound antigens and antibodies. This binding operation between an antibody and an antigen takes place at certain sites of the two types of cells. These sites are called binding sites. The transducer has the ability to recognize if there is any binding site in a given input (a cell or a data in computer terminology) and to delete the corresponding to antigen part of the site. Moreover, it learns, that is, it obtains a memory to delete the same part in the remaining string.

The other model regards the bounded complexes as trees, so it is a tree transducer. If the transducer recognizes a binding site then starts to produce the corresponding killer and memory nodes of the input tree. Thus the output is a non-infected forest with large amounts of killers and memory nodes in its trees. We show that such a transducer can easily be simulated by an alphabetic tree transducer [6], and thus the transformation induced by such a machine is alphabetic [1]. By this result interesting properties are obtained for this new object.

Tarakanov and Dasgupta have done another approach to immune-based systems in [8], using the notions of formal proteins and networks of bindings.

We assume the reader to be familiar with notions from classical formal language theory [7] and tree language theory [3,4]. We describe only some notations that will be used in the sequel.

An alphabet is a finite set Σ, whereas Σ* denotes the free monoid generated by Σ, that is the set of all words over Σ. If Σ,Γ are two alphabets then ΣΓ={σγ / σΣ, γΓ}, and Σ*Γ*={wu / wΣ*, uΓ*}.

A finite ranked alphabet is a set Σsuch that to each σΣ, we associate a natural number denoted by rank(σ) and called the rank of σ. A letter might have more than one ranks. The set of all trees TΣ over the ranked alphabet Σ,is defined inductively:

- σ TΣ for σΣ0,

-if t1,,,tnTΣ and σΣn, then σ(t1,,,tn)TΣ.

A forest or tree language is a set F TΣ.

2.Immune-based transducers

2.1.A detector transducer on words

We consider the set Σof amino acids that is a set of twenty letters. Then each protein is a word over this alphabet. As it happens in nature, we assume that two proteins (e.g. one antibody and one antigen) can bind at specific sites called determinants or binding sites according to the binding specificity theory [5].

For simplicity sake, we consider the binding sites to be a subset A of ΣΓ where Γ is a copy alphabet of Σ. Γwill stand for the alphabet of structural units of the antigens. Let A={r1,r2,…,rn}, with ri=σiγi, i[n] ([n]={1,2,…,n}). For each ri=σiγi, we write left(ri)= σi and right(ri ) =γi. Then left(A)={σ1,σ2,…,σn}, and right(A)={γ1,γ2,…,γn}. We say that a word w(ΣΓ)* is a bound complex string if w=u1uu2, with u1,u2(ΣΓ)* and uA. Consider the finite set

Definition 1. A detector transducer is a five-tuple D=(Σ,Γ,Q,q0,R), with input and output alphabet the set ΣΓ. Q is the finite set of states and q0Q is the initial state. R is the finite set of rules, defined by:

if and

if for each

if , for each

if for each

if , for each .

Intuitively, the detector takes as input a word (a data) over the alphabet ΣΓ and starts the process at the initial state q0. When the detector processes a letter α which is not a left letter of any binding site, i.e. αleft(Α), then leaves the state unchanged. If it processes a symbol σ such that σ=left(r),rA, then it goes to the state qσ. If the next symbol is γ such that r=σγ, then deletes γ and transfers the control of the system to the state qr. From this time the detector has learnt to recognize the symbol γ and it deletes each appearance of this letter in the rest of the word. If the letter after σ was such that then the transducer should return to the state q0, that is no bound complex should had been recognized. Then, the transducer processes the next letter, which if it is a left letter σiof A, then it transfers the control to . If the next letter is γι, such that σiγi = then the transducer deletes the antigen symbol γi and goes to the state . Thus, each time the detector recognizes a binding site ri=σiγi, deletes (kills) the antigen symbol γi. Moreover, it learns (that is it obtains a memory) to do the same, each time the same antigen symbol appears in the rest of the sequence, even if this antigen symbol does not belong to a binding site. The learning ability is described from the movement to a state, which contains as index the corresponding symbol ri.

It is clear that the above model does not contain empty moves and so it is real-time. With this transducer, we simulate the self and non-self recognition process of the vertebrates' immune system [5].

Alternatively, we can construct the detector transducer in such way, that the binding sites are words from the set Σ*Γ*.In this case we can simulate the maturity of the antibodies after the detection of an antigen. Moreover,such a model can be equipped with a control so that it can recognize a binding site under certain conditions, for example, if we consider the determinants with a weight. In this case, we assume that binding happens only if the binding energy is suitable (e.g. less than a threshold) [5]. With this condition we can simulate the fault tolerance that is needed in a virus-detecting system. It is known [5] that the human’s immune system is also fault tolerant in order to avoid autoimmunity. The investigation of such models is the scope of our future research.

2.2.A detector transducer on trees

We consider as above the same alphabets and binding sites. Now, the proteins are trees over the alphabet ΣΓ{α} where α is a new symbol of rank 2 and the symbols from ΣΓ play the role of leaves. We define a detector tree transducer, to be a seven-tuple D=(ΣΓ,Q,Σm,Σk,d,q0,R), where Σm, Σkare two copies of Σ and stand for the set of memory and killer cells, respectively. The inputs are trees with leaves in ΣΓ. If the detector processes a binding site, then it deletes (kills) the antigen and produces with the help of the symbol d (which simulates the ability of the human immune system to produce the antibodies), the killer and memory cells corresponding to this binding site. These memory cells have the specificity to bind with the same antigen in the future, that is the detector learns to recognize the foreign intruders.

From the biochemistry point of view, with this model we try to simulate the clonal selectionability of the immune system [5].

Let us describe the above in a more formal way. We denote byΣ0 the set of all amino acids and let Γ0be a copy of it, serving as the set of structural units of the antigens. Consider now the finite ranked alphabets Σand Γ, with Σ=Σ2=Σ0 and Γ=Γ2=Γ0. A set A Σ0Γ0 is the set of binding sites. We define the ranked alphabet ΣΓsuch that (ΣΓ)0=(ΣΓ) and (ΣΓ)2=(ΣΓ)0{α}.

An element of TΣΓ is called a bound complex tree if it contains at least one symbol from A.

Proposition 2. The set of all bound complexes is a recognizable forest.

Proof. We can use standard constructions from tree automata theory. Indeed, a bottom-up tree automaton can recognize the set of all bound complexes [3,4].

Definition 3. A detector tree transducer is a scheme T=(ΣΓ{α},Σm,ΣkQ,d,q0,R), with input alphabet ΣΓ{α}, output alphabet ΣΣmΣk where Q is the finite set of states defined as , q0 is the initial state, and d is a new symbol. R is the finite set of rules:

σm Σm

σk Σk

σk Σk.

The detector tree transducer works in the following way: it admits as input a tree, where all its nodes with rank 2, are labeled by the symbol α, and its leaves are symbols in ΣΓ. The transducer leaves unchanged the symbol α. If a leaf that does not belong into A is being processed, then the detector produces the left part of the leaf. If the leaf in process is an element r=<σ,γ> in A, then the output of the transducer is the tree . From this time the transducer obtains the ability to produce memorysymbols of σ, with the help of the node , and killer symbols of σ, with the help of the node .

With this model, we simulate the ability of the human immune system to produce killer and memory cells, after the recognition of an antigen.

The tree transformation induced by a detector tree transducer is defined classically [3,4,6] and is denoted by D.

We denote by DETT the class of the tree relations induced by all detector tree transducers.

It is not difficult to see that for a detector tree transducer D, there exists an alphabetic tree transducer T [6], such that such that D=T.Thus

Proposition 4. It holds DETTAT where AT denotes the class of alphabetic relations.

By the above proposition and the corresponding results in [6], we obtain the next:

Corollary 5. The domain and the range of each detector tree transducer are recognizable forests.

Corollary 6. The detector tree transducers preserve the recognizable, algebraic, one counter, restricted one counter and generalized synchronized forests.

Conclusion. With the two detector transducers, we try a first approach to model the human’s immune system from the Theoretical Computer Science point of view. The machines are real-time, that is they do not contain empty moves which means that they do not get into infinite useless loops. Thus they can serve as models for computer security systems. Of course in this case where we need smaller alphabets e.g. the binary alphabet, the constructions get simpler.

In a forthcoming paper, we compare the computational power of these models with those constructed by Tarakanov and Dasgupta [8], and we try to apply them as a theoretical basis to the work of St. Forrest and others [2].

References

1)Bozapalidis S., Alphabetic tree relations, TCS 99, 1992, 177-211.

2)Forrest S.,

3)Gecseg F. and Steinby M., Tree Automata, Akademiai Kiado, Budapest, 1984.

4)Gecseg F. and Steinby M., Tree languages, in: Rozenberg G., Salomaa A. (Eds.), Handbook of Formal Languages, vol. III, Springer, Berlin, pp. 1-68.

5)Lehninger A., Principles of Biochemistry, Worth Publishers, Inc. 1982.

6)Rahonis G., Alpabetic and synchronized tree transducers, TCS 255, 2001, 377-399.

7)Salomaa A., Formal Languages, Academic Press, New York, 1973.

8)Tarakanov A. and Dasgupta D., A formal model of an artificial immune system, Biosystems, 55(1-3), 2000, 151-158.

[1]Supported by the EC Project IST-2000-26016

[2]Email:

[3]Email:

[4]Email: (Corresponding author)