DECISION BASED INTRUSION DETECTION SYSTEM USING GENETIC ALGORITHM
Ms. PriyankaKisanGhodekar
Ms. SnehaPrabhakarChaskar
Ms. ShailaBhausahebDere
Mr.Harish TanajiIndalkar
Student, Department Of Computer Engineering, Sharadchandra Pawar College of Engineering, Pune, Maharashtra, India
Abstract- In today’s scenario the security is important in vast growing computer networks, so Intrusion Detection System is essential task in daily life practices. There are various approaches being utilized for intrusion detection. We present an Intrusion Detection System using Genetic Algorithm and Decision Tree for efficiently and effectively to identify various types of intrusions or attack. This project is on developing advanced intelligent systems using ensemble computing techniques for detection of intrusions. Integration of computing techniques like Genetic Algorithm (GA), and Decision Tree (DT) are used to detect and prevent intrusions. The Intrusion Detection System in Networking Using Genetic Algorithm (IDS) and Decision Tree is to identify the intruder and block the data from the intruder to avoid the system attack by the virus. The major components of the system are creating new set of rules during run time. The GA component imparts the feature subset selection through a suitably framed fitness function. A decision tree is used to detect the subtype of attack. The KDDCUP99 training and testing dataset is used to generate effective new rules by adopting reasonable detection rate.
Keywords-Computer and Network Security, Intrusion Detection System, GeneticAlgorithm, KDD CUP 1999 Dataset, Decision Tree.
1. INTRODUCTION
An intrusion detection system (IDS) is a device (or application) that monitors network and/or system activities for malicious activities or policy violations and produces reports. It is a process of controlling the events occurring in a computer system or network and analyzing them for signs of possible incidents, which are violation of computer security practices.
Intrusion Detection Systems have undergone rapid growth in power, scope and complexity in their short history. In recent years, Intrusion detection system has been one of the most sought after research topics in the field of Information Security having huge applications in the cooperate world where data integrity and security is a complex issue.
When an intruder attempts to break into an information system, then we can say intrusion occurred. Intruders may be external or internal depending upon the authorization level. Intrusion techniques may include exploiting software bugs or system configurations, password cracking. An Intrusion Detection System (IDS) is a system for detecting intrusions and reporting them accurately to the proper authority. IDSs are usually specific to the operating system that they operate in and are an important tool in the overall implementation of an organization’s information security criteria by defining the rules and practices to provide security, handle intrusions, and recover from damage caused by security breaches.
2.RELATED WORK
GAs has been used for network intrusion detection in several ways. Some of them use Genetic Algorithm for to obtain the classification rules,while others use different AI methods for possession of rules, where GAs are used to select appropriate features or to determine the optimal parameters of some functions.
Li represent a technique using GA to detect abnormal network intrusion. This approach includes is obtaining classification rules for quantitative and distinct features of network data. Apart from the implementation of rule generation for IDS is given but results of experiments do not exist.
Bridge:This method is a combines both fuzzy data mining techniques and Genetic Algorithm for detection of network anomalies and misuses. The most features are not predicted properly in various existing Genetic Algorithm based IDS’s. This method uses Genetic Algorithm to recognize the optimal parameters of the fuzzy functions for selecting the features of the relevant network.
Lu: In this method classification rules are generated by Genetic Programming. Detection or Classification of intrusions in the network with the help of the fitness function is fine tuned by this method. The time required to train the system with huge data creates Genetic Programming implementation difficult.
Crosbie: Different agent techniques and Genetic Programming can be used to detecting network intrusions. The set of agents that determine the network behaviors can be finding out by an agentwho monitors one parameter of the network audit data and Genetic Programming. Many small autonomous agents can be used in this method which is an advantage and the communication among the agents is a drawback.
Selvakani: This system identifies the attacks using rule set by proceeding Genetic Algorithm, then exploit rules only for R2l and DoS type of attacks. Between these two attacks, one from each is selected. The common performance of the system is less than 60%.
3.INTRUSION DETECTION SYSTEM OVERVIEW
The following sections give aoverview of various components of Intrusion Detection System, classifications and networking attacks.
3.1Components of Intrusion Detection System
An intrusion detection system normally consists of three functional components. The first component is known as the event generator ordata source. The second component is known as the analysis engine, which takes information from the data source and examines the data for symptoms ofattacks. The analysis engine use followinganalysis approaches:
_ Misuse/Signature-Based Detection: This type of detection engine detects intrusions that follow well-known patterns of attacks (or signatures) that exploit known softwarevulnerabilities. The main limitation of this approach is that it only looks for theknown weaknesses and may not care about detecting unknown future intrusions.
_ Anomaly/Statistical Detection: An anomaly based detection engine will search forsomething rare or unusual. The drawbacks of the system are that they are highly expensive and they can recognizean intrusive behavior as normal behavior because of insufficient data.
_ The third component of an intrusion detection system is the response manager, which will only act when inaccuracies (possible intrusion attacks)are found.
3.2Classification of Intrusion Detection
Intrusions Detection can be classified into two following categories.
_ Host Based Intrusion Detection: HIDSs evaluate information found on a single ormultiple host systems, including contents of operating systems, system and application files.
_ Network Based Intrusion Detection: NIDSs evaluate information captured fromnetwork communications, analyzing the stream of packets which travel across the network.
3.3Networking Attacks
This section is an overview of the four major categories of networking attacks and each attack is placed into following groupings.
_ Denial of Service (DoS): A DoS attack is a type of attack in which the hacker makes acomputing or memory resources too busy or too full to serve legitimate networkingrequests and hence denying users access to a memory resources.For example smurf, apache, mail bomb, Neptune, etc.
_ Remote to User Attacks (R2L): A remote to user attack is an attack in which a usersends packets to a machine over an internet, which he or she doesn’t have access to incase to expose the machines vulnerabilities and exploit privileges which a local userwould have on the machine. For example phf, guest, xlock, xnsnoop, etc.
_ User to Root Attacks (U2R): These attacks are exploitations in which the hacker startsoff on the system with a normal user account and attempts to abuse vulnerabilities in thesystem in order to gain super user rights. For example xterm, perl, etc.
_ Probing: Probing is an attack in which the hacker scans a machine or a networkingdevice in order to determine weaknesses or vulnerabilities that may later be exploited. This is used in data mining. For example portsweep, nmap, etc.
4.PROPOSED SYSTEM
- This system generates its own rules using a GA.
- The system is implemented using a KDDCUP 1999 Testing and Training Datasets.
- It dynamically increases the rules in the dataset according to the packets flowing in the network. Because of this reliability of the system also increases.
- Our system classifiesattacks using a Decision Tree.
- The major objective of this system is to improve the detection rate.
5.WORKING OF OUR IDS IN REAL SYSTEM
The following diagram shows the components of our system.In the system attack detected before arriving packet on machine through network connection. For this we are using IDS which contains KDD, GA and Decision Tree.
Figure 5.1 Applying Intrusion Detection System
Let’s see where to apply the Intrusion Detection System. A network connection is a sequence of TCP packets starting from a source IP address and ending at target IP address, results in 41 attributes for every connection and 1 manually assigned record type. Before arriving that packet on machine that network connection analyzed by Intrusion Detection System. If that connection is normal connection then permitted. And if that connection is attack type then type of attack is detected and system generates alert.
6.FLOW OF SYSTEM
Figure 6.1 Flow of System
Let’s see the flow of system. KDD CUP 1999 dataset contains records in which each record has 41 attributes and 1 manually assigned record type. Record type indicates whether a record is a normal network connection or abnormal network connection. Here we are using only 6 attributes. For the extraction of attributes we are using a weka tool. Then that 6 attributes given to the genetic algorithm as input. Then genetic algorithm generates runtime rule set which is nothing but a one chromosome. We can say that chromosome may be a signature of normal type or attack type. Match that signature with the signatures predefined in test data set. If that signature match with the signature of a particular attack then that attack is detected. Then that rule set or detected attack given to the decision tree as a input. Then decision tree classify that attack in a particular attack type. Generated rule set will be stored in rule base.
7.GENETIC ALGORITHM
A Genetic Algorithm (GA) is a programming technique that reproduces biological evolution as a problem-solving strategy. GA is a technique which works on the mechanics of natural selection. It is based on the Darwin’s theory of survival of the fittest.
The GA process begins with a set of potential solutions or chromosomes which are randomly generated or selected. These chromosomes are normally encoded in the binary form but other forms of encodings are also used. The entire set of these chromosomes comprises a population. In every generation the fitness of these chromosomes is checked. Fitness function is used to find out the fitness of the chromosomes and then selection operator will choose the fittest chromosomes using tournament selection. The chromosomes with poor fitness value are discarded.
GA uses an evolution and natural selection that uses a chromosome-like data structure and evolve the chromosomes using selection, recombination (crossover), and mutation operators. The process generally begins with arbitrarily generated population of chromosomes, which represent all potential solution of a problem that are measured applicant solutions. Different positions of each chromosome are encoded as bits, characters or numbers, which is refer as genes. An evaluation function is used to compute the decency of each chromosome according to the desired solution is known as “Fitness Function”. For the period of evaluation, the basic two operators, crossover and mutation, are used to imitate the natural reproduction and mutation. The selection of chromosomes for survival and combination is biased towards the best fit chromosomes.
7.1FLOWCHART OF GA
The following flowchart shows the flow of a simple genetic algorithm. Starting by a random generation of initial population, then evaluate and evolve through selection, recombination (crossover), and mutation. Finally, the best individual (chromosome) is picked out as the final result once the optimization meets it target.
Figure 7.1.1 Flowchart of GA
7.2ALGORITHM OF GA
GA_Rule _Generation
Input: Encoded binary string of length n (where n is the number of features being passed), number of generations, population size, crossover probability (Pc), mutation probability (Pm).
Output: A rule set generation for IDS.
1. Initialize the population randomly.
2. Initialize N (total number of records in the training set).
3. for each chromosome in the new population
4. Calculate fitness= Fx/Sum (Fx)
5. End for
6. Select 50% best fit chromosome and remove worse fit chromosome.
7. Apply Crossover to best selected chromosome.
8. Apply Mutation for each chromosome to generate new population .go to step no3.
9. Stop
GA Parameters
GA has some general elements and parameters which can be defined:
• GA Operators The differentGA parameter selection mutation and crossover are the most successful parts in the algorithm as they are contribute in the generation of each population.
- Selection phase where population individuals with superior fitness are selected, otherwise it gets damaged.
•Crossover is a method in each pair of each individuals selects arbitrarily participates in exchanging their parent’s genes with each other, until an entire new population has been generated.
•Mutation flips some of the bits in an individual, and since all bits could be filled, there is low probability of predicting the change.
• Fitness Function The fitness function is defined as a function which scales the value individual relative to the rest of population. It generates the best possible solutions from the amount of candidates located in the population.
In preprocessing phase the KDDCUP99 Dataset is processed by using Weka tool which is used to remove the redundant data from existing Dataset which result in tested Dataset. The removal of redundant data or records from Dataset it improves the detection rate of desired result and improves the performance of our system.
In detection phase the Genetic Algorithm is applied on chosen features data set and locate fitness for every rule with the following fitness function.
Fitness = Fx / sum (Fx)
Where Fx is the fitness of individual x and sum (Fx) is the entire fitness of all individuals.
8.KNOWLEDGE DISCOVERY DATASET(KDD)
KDDCUP99 is based on DARPA data from MIT Lincoln Laboratory is broadly used to evaluate IDSs. In this study, we used the KDDCUP99 training and testing datasets.
Each record of the datasets consists of 41 network features and 1 manually assigned record type. Nine network features were used in the GA which is Duration, Protocol, Service, Flag, Source bytes, Destination bytesetc. The record type indicates whether a record is a normal network connection or abnormal network connection.
The KDD 99 intrusion detection benchmark consists different components :
kddcup.data; kddcup.data_10_percent; kddcup.newtestdata_10_percent_unlabeled;
kddcup.testdata.unlabeled; kddcup.testdata.unlabeled_10_percent; corrected.
We have used “kddcup.data_10_percent” as training dataset and “corrected” as testing dataset.
In this case the training set consists of 494,021 records among which 97,280 are normalconnection records, while the test set contains 311,029 records among which 60,593 are normal
connection records. Table 8.1 shows the distribution of each intrusion type in the training and the
test set.
Table 8.1- Distribution of intrusion types in datasets
Types / DatasetTrain (“kddcup.data_10_percent”) / Test (“corrected”)
normal / 97280 / 60593
probe / 4107 / 4166
dos / 391458 / 229853
u2r / 52 / 228
r2l / 1124 / 16189
Total / 494021 / 311029
9.EXPERIMENTAL RESULTS AND ANALYSIS
We get better detection rate for denial of service & user-to-root and close detection rate forproberemote-to-local.
Table 9.1- Detection rate of intrusions
Types / Total no. of records / Correctly detected records / DetectionRate
normal / 25640 / 22398 / 87.35 %
Probe / 17890 / 14177 / 79.24 %
dos / 16872 / 14982 / 88.26 %
u2r / 8742 / 5230 / 59.82 %
r2l / 6412 / 4310 / 67.21 %
10.CONCLUSIONS
In this paper, we present and implemented an Intrusion Detection System Using Genetic Algorithm and Decision Tree to efficiently detect various types of network intrusions. The KDDCUP99 training and testing dataset is used to generate effective new rules by adopting reasonable detection rate.
The major advantages of this proposed detection system can be generating the new rules to the systems as the new intrusions become known. A GA is used to obtain a set of classification rules. The six featureswere used when encoding and obtaining the rules. A simple but effective and flexible fitness function is used to select the appropriate rules. Depending on the selection of fitness, the generated rules given to the Decision Tree to detect network intrusions or categorize the types of intrusions.
The Genetic Algorithm based Intrusion Detection System’s detecting several types of attacks is possible with a high rate of rule set provided.
11.REFERENCES
[1] Mohammad SazzadulHoque, Md. Abu NaserBikasi,and Md. Abdul Mukit “An Implementation Of Intrusion Detection System Using Genetic Algorithm”, International Journal of Network Security and Its Applications (IJNSA), Vol.4, No.2, March 2012.
[2] Ch.Satya Keerthi.N.V.L#1, B.Minny Priscilla*2,P.Lakshmi prasanna#3,M.V.B.T.Santhi#4, “Model Generation for an Intrusion Detection System using Genetic Algorithm”, International Journal of P2P Network Trends & Technology- Vol.1Issue2- 2011.
[3] Mark, Crosbie, and Gene Spafford. 1995. “Applying Genetic Programming to Intrusion Detection”. In Proceeding of 1995 AAAI Fall Symposium on Genetic Programming, pp. 1-8. Cambridge, Massachusetts.
[4] Bridges, Susan and Rayford B. Vaughn. 2000. “Intrusion Detection via Fuzzy Data Mining”, In Proceedings of 12th Annual Canadian IT Security Symposium, pp. 109-122. Ottawa, Canada.
[5] KDDcup 1999 data,
[6]