Design and Evaluation of Dynamic Neural Network Based on Reinforcement Learning

Analysis ofDynamic Neural Network Modelbased on SelfLearning

Anil KumarYadav1 Dr A.K Sachan2

1. PHD Scholar, Department of CSE, IFTMUniversity, Moradabad, U.P

2. Director RITS, Bhopal.

ABSTRACT

DNN (Dynamic Neural Network) is efficient tool, which performs leaning and classifying dynamic data sets and to solve decision-making problem in artificial intelligence.While a neural network has the ability to continuously accept new data and forms clusters of similar patterns. Combination of these methods to develop the application of reinforcement learning and provides a new idea for efficient learning during real time operation of the agent. Neural network takes input as a states/action level for supervised learning. Thatwillbe helpful for creating a new idea about classification of data set for agent learning during real time operation. Self learning is a method to take decision itself for reinforcement learning through trial-and-error interaction with its environment without prior knowledge of the system. It is widely used by different research field as intelligent control, robotics and neuroscience. It provides us possible solution within unknown environment.

In this paper, westudy and comparison a dynamicneural network methods and reinforcement learning techniques especially Q learning and TDN. Dynamic neural network (DNN) can be appliedto solve issues with performance in reinforcement learning in the context of training episodes, discount rate and learning time.

Keywords: Dynamic neural network (DNN),Machine intelligence, Reinforcement learning, Learning classifier, Decision making system

Introduction

Dynamic neural network is essential for supervised learning because it can be able to classify and predict actual data input. There have been many successful applications self learning to take decision itself. [1] Design of dynamic neural network to forecast short-term railway passenger demand, [2] a dynamic neural network method for time series prediction using the KIII model,[3] a dynamic neural network for continual classification, [4] dynamic neural networks partial least squares (DNNPLS) identification of multivariable processes,[5] neural networks and dynamics,[6] stochastic learning methods for dynamic neural networks, simulated and real-data comparisons. These researches show effective trainingmodels are the fundamental of agent learning during real time operation. Self learning in the case of agent or robotics field is also a vital issue because machine learning, especially query based reinforcement learning. Learner (agent) during learning processes having a lot of training episodes. That will take large amounts time traditional general reinforcement learning technique like Q learning are facing the challenges in the field of such kinds of training episodes having more loop that affect the best decision path in original episodes. For example,[7] reinforcement learning is widely use by different research field as intelligent control, robotics and neuroscience. It provides us possible solution within unknown environment, at the same time we have to take care of its decision because agent can independently learn without prior knowledge or training and it take decision by learning experience through trial-and-error interaction with its environment. For example, in case of multi agent training that required a lot of training input of execution cycle and another important problem associated with self learning is that for making the best decision (or shortest path), the size of data increase drastically as the states and action of the system. In over came, it required very large memory as a look table to train the agent,

Many papers have compared different methods to discount rate, training episodes, shortest path and efficiency. [8][9] Concluded that Q learning that much knowledge acquired by agent takes large repetitive loops in every episode. It may affect on discount rate and training time .In this study, we focus on the classification models of artificial neural networks as DNN which applied on reinforcement learning algorithms to evaluate discount rate and enhance learning time[10].These researches apply only RL techniques we suggest if we combine DNN with Q learning that is efficient approaches in now days. All of the paper showed either performance of the dynamic neural networks or Q learning. However, there is no consistency about which types of design is the best for self decision making. In the study, we think about new combine models to train agent for learning and training as DNN with Q learning.DNN models based on reinforcement learning, which is belonged to the super wise learning, in the agent training it trained agent using classification of data sets and agent learn efficiently actual data input, which reduce loops and fuzzy decision.

2. Dynamic Neural Network Training Model

In this study,we use newdynamic neural network training modelFig.2.0.1, which are used for agent trainingusing neural network classifier in accept dataand Fig.2.0.2 describes after trained the agent or agent on work. Dynamic neural network is the modification of artificial static neural network. That new DNN models are developed based on reinforcement learning and supervised learning mechanism.

Fig.2.0.1 DNN through Agent training

Fig.2.0.2 Trained Agent on work

2.1Q Learning Model

In this section, Fig.2.1 describes that how agent learns from environment through trial-and-error interaction with the help of reward and penalty without prior knowledge of the system.Q learning (QL) is a part of machine learning. It helps to agent for self learning. Usually it stores learned value in the form of look up table [10].

Fig.2.1 Q Learning through Agent training

2.2 Performance Measurement

In this study, we used formula forlearning rate and episodesfor efficient learning in the form of accuracy. Eq is goal efficiency and over all percentage of correctly evaluated learning rate

EQ= [1- ] x100

Where T is the total number of states, minimum count step is I and x1 is total count step.

3. Result Analysis and Comparison

In this section, we observe that average accuracy of table.3 efficient than table.1 and table 2. The Table.4shows that Comparison table between DNN and TDN, Q Learning in terms of percentageaccuracy.

Firstly, we start to showFigure 3.1, that a howagent learns in the grid world problem. Agent who travels a virtual world called a grid world. In this grid world agent’s aim is to reach the goal. We take the size of grid world is MxN, where grid is square, M is equal to N as 10.the agent starts state position at the top left most. The goal is right most cells at the bottom, that is (10, 10). Agent can moves only one cell at a time to neighboring cell that is, up words, down words, to the right, or to the left, unless the agent touches the border or wall, if the action is possible. When the agent is touches the border or wall, the action that makes the agent cross the border is not performed but it must remain stooped or take decision for next available action. There are no rollback condition apply here. This would be repeated until the agent reaches the goal.

Fig.3.1: In the Grid world of 10x10, the agent can move in four directions to find the goal.

For the example, if the agent is at (1, 1) and the action is to down then agent remains at that point (2, 1), and if the next action is to right, then moves to (6, 4) or when the next action is down then agent moves to (8, 3). The grid world those agents are going to explore using a decision path is like a grid shown in above figure. Agent finds decision path from figure 3.1,the start position to goal position in 10x10 grid world using Q-learning ,agent select randomly one of four actions at a time.

Agent moves one step upwards, downwards, to the right, or to the left, if the action is possible. If the movement is not possible due to the border of the grid world, do nothing and decide the next action again at random. This would be repeated until the agent reaches the goal. The maximum number of steps should be determined depending on the size of the grid world. We now look at the result of a random move by agent in a mentioned above. See the Figure. 3.2 using DNN, where an agent reaches the goal cell, it gains the reward 100.the value of discount rate parameter is set to be 0.9. Comprehensive way to remove loops and find shortcuts from episode for speeding up convergence, While the start cell is (2, 8) and fixed, the goal cell (9, 8) and is determined at random. The agent perceives its own coordinates (x, y), and has four possible actions to take: moving up, moving down, moving left and moving right, that is to say, it has actions to move into (x,y+1),(x,y-1),(x-1,y) and (x+1,y).some cell have walls at the boundaries with their adjacent cells, and the movement of the agent is blocked by the walls and the edge of the grid world. In addition, there are no roll back condition occurred here.

A B

Fig.3.2: In the grid-world of 10x10, starting from (2, 8) an Agent moves aiming the goal at (9, 2) of which the agent had no a-priori information. Left: A path chosen from 50 trials by random move to the goal .Right: B route of the shortest path to the goal trial had 200 episodes.

Learning Rate or discount rate / For Episode
E=1000 / For Episode
E=3000 / For Episode
E=5000 / For Episode
E=7000 / For Episode
E=9000 / For Episode
E=11000
E1 / 70 / 98.80 / 83 / 99.87 / 83 / 99.80
E2 / 70.90 / 70 / 63 / 83 / 98.81 / 98.82
E3 / 71.10 / 68.90 / 97.67 / 97.67 / 82.90 / 84.50
E4 / 83 / 74.50 / 98.76 / 83.90 / 99.80 / 98.80
E5 / 99.90 / 99.82 / 51.11 / 99.05 / 98.90 / 99.10
Average / 78.98% / 82.40% / 78.69% / 92.69% / 92.80% / 96.20%

Table 1: Comparison table between different episodes for Q Learning (QL) [10]

Learning Rate or discount rate / For Episode
E=1000 / For Episode
E=3000 / For Episode
E=5000 / For Episode
E=7000 / For Episode
E=9000 / For Episode
E=11000
E1 / 97.90 / 71 / 76.60 / 98.90 / 99.20 / 99.88
E2 / 71 / 70 / 62 / 79.90 / 82.50 / 84
E3 / 69.90 / 72 / 93.54 / 76.60 / 85.54 / 96.67
E4 / 75 / 82.90 / 95 / 82 / 84.90 / 82.90
E5 / 99.72 / 98.80 / 53 / 95 / 96.40 / 98.90
Average / 82.70% / 78.94% / 76% / 86.40% / 89.70% / 92.47%

Table 2: Comparison table between different episodes for TDN [10]

Learning Rate or discount rate / For Episode
E=1000 / For Episode
E=3000 / For Episode
E=5000 / For Episode
E=7000 / For Episode
E=9000 / For Episode
E=11000
E1 / 71 / 98.90 / 84 / 99.89 / 84 / 99.80
E2 / 70.80 / 71 / 67 / 85 / 98.85 / 98.92
E3 / 71 / 68.80 / 97.75 / 97.70 / 83.95 / 85.50
E4 / 82 / 74.60 / 98.80 / 83.94 / 99.84 / 98.90
E5 / 99.98 / 99.92 / 52.10 / 99.10 / 98.91 / 99.10
Average / 80.98% / 82.60% / 83.69% / 93.78% / 94.80% / 96.36%

Table 3: Comparison table between different episodes for DNN

As can be seen from the above table.1, 2, 3 and 4,average accuracy of dynamic neural network with different episodes is better than TDN and QL. After completion of all the data we found that the DNN is efficient model in the context of discount rate, learning time, memory usage. Finally we showsPerformance comparison graph between DNN and TDN, Q Learning in fig.3.1.1.

Used Model / Accuracy in percentage
DNN / 80.98 / 82.60 / 83.69 / 93.78 / 94.80 / 96.36
QL / 78.98 / 82.40 / 78.69 / 92.69 / 92.80 / 96.20
TDN / 82.70 / 78.95 / 76 / 86.40 / 89.70 / 92.47

Table 4: Comparison table between DNN and TDN, Q Learning

Fig.3.1.1 Performance comparison graph between DNN and TDN, Q Learning

4.0 Discussion and Conclusion

In this research, we found that dynamic neural network model is good methods to train the agent before learning. However, different learning techniques take lots of training input in each episode. Those are required large memory to contain learned data in the form of look up table.It may found that a dynamic neural network used for self learning and classifying dynamic data sets. A neural network has the ability to continuously accept new data and forms clusters of similar patterns. Therefore, DNN may act as an effective decision making unit (NN classifier) to take as an input/action for training agent.

In this study, we concludethat dynamic neural network model is effective than different reinforcement learning especially TDN and Q learning.DNN provides important mechanism to evaluation of machine learning issues such as control problem,robtices,weathering forecasting etc. DNN also used for sequential decision making system.Therefore,DNNand RL will remain an active area of research in the near future.

References:

[1]Tsung-Hsien Chi-Kang LEE,”Design of Dynamic Neural Network to Forecast Short-Term Railway passenger Demand”, Journal of the Eastern Asia Society for Transportation Studies, Vol. 6, pp. 1651 - 1666, 2005. vol.6, pp.1651-1666, 2005

[2]Haizhon Li arobert Kozma,”ADynamic neural network metheod for time series prediction using the KIIImodel,”IEEE2003.pp.347-352

[3] Lang and Warwick”,A Dynamic Neural Network for Continual Classification”IEEE, 2002, pp.1-11.

[4]Olufemi and Armando”, Dynamic neural networks partial least squares (DNNPLS) identification of multivariable processes”, Elsevier 2003, pp.143-155.

[5] Tim and Eajan”,Neural Networks and Dynamics”, Neuroscience, vol.28 ,2005,pp.326-357.

[6] Pathan and Parsani”,Stochastic learning methods for dynamic neural networks, simulated and real-data comparisons”, American control conference, 2002, pp.2577-2582.

[7]Lucian and Robert,”A comprehensive survey of multi agent reinforcement learning,”IEEE2008.pp.156-169

[8] Habit Karbasian1, Maida N, “Improving Reinforcement Learning Using Temporal Deference Network EUROCON”, IEEE, 2009, pp.1716-1722

[9] Hitoshi Ima and Yaouk Karo, “Swarm Reinforcement Learning Algorithms Based on Sara Method”, IEEE, 2008, pp.2045-2049

[10] Anil kumar yadav and Shailendra kumar shrivastav,”Evaluation of Reinforcement Learning Techniques”, ACM, vol. 132, 2010, pp. 88–92, (ISBN: 978-1-4503-0408-5).

[11] Anil kumar yadav and Dr. Ajay kumar Sachan,”Research and Application of Dynamic Neural Network based on Reinforcement Learning”, Springer2012, pp. 931–942, (ISBN: 978-3-642-27442-8).