CONDITION IDENTIFICATION OF MULTIFACTOR OBJECTS AND SYTEMS
G.L. Khazan
Ural State Technical University, Ekaterinburg, Russia
Abstract
Discussed problem to identifications, closely verging with diagnostics of objects of scientific observing in metallurgy and foundry. Considered event, when observed situation will be recognized (is referred to determined class) on the base of analysis external symptoms or Results of instrumental checking (chemical Product analysis; checking data of temperature and pressures under which occurs an interaction of reagents; complex of parameters, defining phase balance and others.).So,locating computer database with different combinations these signs, it is possible to calculate probability a belonging of each new event to one or another class of situations,available in the database. Hereon computer gives a list of controlling influences, which must ensure a transition to optimal condition of analysing object. Results of control are used for automatic adaptation and correction of source database.
Problem to identifications closely verging with diagnostics of objects of scientific observing in metallurgy and foundry. So will set the problem modeling diagnostics to certain observed situations. Let this situation Z is diagnosed (is referred to some class) on the base of analysis external symptoms Y ( ... ) and results X ( ... ) instrumental checking (chemical analysis of product, checking data of temperature and pressures under which occurs an interaction of reagents, complex of a phase balance parameters, and others.). Probably, between diagnosis Z on the one hand and vector elements X and Y exist causal relations of type: "If we have a situation Z, the vector elements X and Y take that values, which with sufficiently big probability correspond this situations". Herewith it is important to note that the most information gives exactly determined combination of ensemble of signs (simptoms), as far as each of them separately, or their simultaneous manifestation by small groups (on two, on three) can observe at the condition Z not only. So, locating database with different combinations X , Y and Z , possible calculate probability accessories of each new event to one or another class of situations (ZA, ZB, ZC,... ), available in the database.
The best recognition (most accuracy of diagnosis) is provided if vector element amount X and Y will sufficiently big. However, if some from these elements are signs unrepresentative for given situations, they will create a background of hindrances, obstructing recognition. Procedures must be provided on preliminary creation educating diagnostic database (table 1), cleaned from unrepresentative signs and casual lines.
№ / Instrumental checking data , X / External symptoms,Y / Situ-ation
x1 / x2 / x3 / x4 / y1 / y2 / y3 / y4 / y5 / Z
1 / x11 / x12 / x13 / x14 / y11 / y12 / y13 / y14 / y15 / ZC
2 / x21 / x22 / x23 / x24 / y21 / y22 / y23 / y24 / y25 / ZA
3 / x31 / x32 / x33 / x34 / y31 / y32 / y33 / y34 / y35 / ZA
... / ... / ... / ... / ... / ... / ... / ... / ... / ... / ...
k / xk1 / xk2 / xk3 / xk4 / xk1 / yk2 / yk3 / yk4 / yk5 / ZB
Table 1. Educating sample
Shall name situation, when it is necessary to recognize possible current condition of object of observing, a recognized situation. Else before an labour-consuming and long instrumental analysis's, obviously, will be fixed external signs to situations Ys = ( ys1 ys2 ys3 ys4 ys5 ). Define a degree removing this vector from vectors Yu , being kept in lines a right of part of educating sample. This value is valued by the way of calculation of distance between match vectors. The most probable diagnosis will those, corresponds a line, distance before which minimum.
In the multivariate space a notion of distance between vectors not wholly uniquely - exist different measures of vicinity. You are-boron the most approaching from they depend mainly from the particularities of the object to identifications, which can turn out to be not wholly identical in consequence of unchancy choice a criterion of vicinity.
Consider this problem on very simple example (Table 2).
№ / Characteristics of alloy (symptoms of situations) / ConditionY1 / Y2 / Y3
1 / 7.2 / 7.5 / 12.7 / Z2
2 / 18.4 / 10.3 / 12.1 / Z1
3 / 19.2 / 10.5 / 7 / Z1
4 / 19.3 / 8.7 / 7.4 / Z1
5 / 13 / 10.3 / 21.4 / Z3
6 / 9.5 / 6.2 / 10 / Z2
7 / 7.3 / 6.4 / 13.2 / Z2
8 / 19.3 / 9.7 / 9.8 / Z1
9 / 16.8 / 11.8 / 19.8 / Z3
10 / 14.3 / 12.6 / 19.3 / Z3
11 / 7.9 / 7.9 / 11.7 / Z2
12 / 7.7 / 6 / 12.8 / Z2
13 / 18.8 / 8.9 / 7.7 / Z1
14 / 18.1 / 12.6 / 17 / Z3
15 / 14.3 / 9.3 / 18.4 / Z3
Table 2. Raw data
Considering values of three signs Y1, Y2, Y3, as some simptoms, will match their value to three varieties of condition Z1, Z2 and Z3, installed after the additional analysis. It is reasonable to allow that if chosen list of three symptoms is kept certain important information, should expect an identical correspondence collections of their quantitative measures to one or another condition. Will first consider else more simple (and more demonstrative) event, when symptom Y3 is ignored. For this look at the figure 1.
Seen that spots, corresponding diagnosis's Z1 and Z2 have formed disjoint compact groups, in that time, as a group of spots, corresponding diagnosis Z3 have formed less thick group, nearly verging with the area Z1, but removed from the area Z2.
Figure 1. Locates jf the points Y1, Y2 accordingly diagnosis's Z1, Z2 and Z3
Understandable that using the graphs, like fig.1 possible in the space of two measurements only, but in multivariate events under-is gone to resort to other methods of revealing the compact groups.
Define measure of distance between any two points of considered ensembles, for instance, points a and b, united direct line, forming hypotenuse of square-wave triangle abc. In accordance with geometry rules this distance corresponds formula: ( 1 ) ,
that in general event corresponds Euclidian distance between points:
( 2 ) ,
Where k = 1,…,n - number of coordinate of multivariate space.;
i , j – serial numbers of points of multivariate space;
E - Euclidian distance between points i and j.
Euclidian distances between any points presented in the table 2, form the matrix of distances (table 3).
In this table - a matrix of column number and lines correspond to numbers the situations from the table 2., but cells contain values of Euclidian distances between points, coordinates which correspond fifteen combinations Y1, Y2 and Y3, taken from same table 2. Wholly naturally that distance "before the most itself" between objects of one name (1_1, 2_2 ,... 15_15) is a zero. From tables follows that Euclidian distance between different objects bases in the range from 0.62 before 15.68. Seen that within of column minimum distance forms from 0.62 before 3.42. Given comparatively small threshold by the limit p = 3 , will realize, considering column data 1, what from rest 14 objects inhere from it on the distance not exceeding this threshold.
№ / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 11 / 12 / 13 / 14 / 151 / 0,00 / 11,56 / 13,62 / 13,26 / 10,82 / 3,78 / 1,21 / 12,64 / 12,69 / 10,95 / 1,29 / 1,58 / 12,71 / 12,78 / 9,28
2 / 11,56 / 0,00 / 5,17 / 5,05 / 10,75 / 10,02 / 11,82 / 2,54 / 8,01 / 8,60 / 10,78 / 11,55 / 4,64 / 5,42 / 7,58
3 / 13,62 / 5,17 / 0,00 / 1,85 / 15,68 / 11,03 / 14,03 / 2,91 / 13,09 / 13,41 / 12,51 / 13,64 / 1,79 / 10,28 / 12,47
4 / 13,26 / 5,05 / 1,85 / 0,00 / 15,44 / 10,44 / 13,53 / 2,60 / 13,02 / 13,48 / 12,21 / 13,08 / 0,62 / 10,43 / 12,10
5 / 10,82 / 10,75 / 15,68 / 15,44 / 0,00 / 12,61 / 10,72 / 13,21 / 4,39 / 3,38 / 11,22 / 10,98 / 14,94 / 7,12 / 3,42
6 / 3,78 / 10,02 / 11,03 / 10,44 / 12,61 / 0,00 / 3,89 / 10,41 / 13,44 / 12,27 / 2,89 / 3,34 / 9,95 / 12,80 / 10,16
7 / 1,21 / 11,82 / 14,03 / 13,53 / 10,72 / 3,89 / 0,00 / 12,90 / 12,77 / 11,16 / 2,21 / 0,69 / 12,99 / 13,02 / 9,19
8 / 12,64 / 2,54 / 2,91 / 2,60 / 13,21 / 10,41 / 12,90 / 0,00 / 10,52 / 11,12 / 11,70 / 12,54 / 2,30 / 7,85 / 9,96
9 / 12,69 / 8,01 / 13,09 / 13,02 / 4,39 / 13,44 / 12,77 / 10,52 / 0,00 / 2,67 / 12,65 / 12,86 / 12,60 / 3,19 / 3,80
10 / 10,95 / 8,60 / 13,41 / 13,48 / 3,38 / 12,27 / 11,16 / 11,12 / 2,67 / 0,00 / 10,99 / 11,37 / 12,98 / 4,44 / 3,42
11 / 1,29 / 10,78 / 12,51 / 12,21 / 11,22 / 2,89 / 2,21 / 11,70 / 12,65 / 10,99 / 0,00 / 2,21 / 11,65 / 12,42 / 9,37
12 / 1,58 / 11,55 / 13,64 / 13,08 / 10,98 / 3,34 / 0,69 / 12,54 / 12,86 / 11,37 / 2,21 / 0,00 / 12,56 / 13,01 / 9,26
13 / 12,71 / 4,64 / 1,79 / 0,62 / 14,94 / 9,95 / 12,99 / 2,30 / 12,60 / 12,98 / 11,65 / 12,56 / 0,00 / 10,03 / 11,61
14 / 12,78 / 5,42 / 10,28 / 10,43 / 7,12 / 12,80 / 13,02 / 7,85 / 3,19 / 4,44 / 12,42 / 13,01 / 10,03 / 0,00 / 5,22
15 / 9,28 / 7,58 / 12,47 / 12,10 / 3,42 / 10,16 / 9,19 / 9,96 / 3,80 / 3,42 / 9,37 / 9,26 / 11,61 / 5,22 / 0,00
Table 3. Matrixes of distances in three-dimensional space (signs Y1, Y2 and Y3)
Seen that this objects 7 ( Е1-7 = 1.21 ), 11 (Е1-11 = 1.29) and 12 ( Е12 = 1.58 ). Having examined columns for objects 7 and 12, make sure that aside from already revealed other nearest neighbors are absent. From consideration of object 11 follows that with the exclusion of already revealed objects to he verges an object 6 ( Е11-6 = 2.89). Thereby, revealed compact group, consisting of objects 1 , 7 , 11 , 12 , 6. Again having looked to table 2, make sure in that that in this group enter all objects, corresponding diagnosis Z2. Other objects renderred removed from enumerated on distances above, vastly exceeding treshold level. Thereby, part of objects is united in thick group - cluster For revealing other clusters needed to analyze the contents of rest columns. Results graphically submitted for the fig. 2 in the manner of graph-schemes. Seen that under p= 3 revealed six clusters: first consists of objects 1,6,7, 11,12 ; second consists of objects 2,3,4,8,13 , third cluster unites objects 9 and 10 , but rest clusters consist of one element only, each of which exists itself, as far as distances before nearest objects exceed accepted threshold level. Note that first from the named clusters corresponds lines tables 2, referring to the condition Z2, but second - be collection objects, characterized by the condition Z1. If enlarge threshold value Euclidian distances before p = 3.5 , (how follows from fig.2) composition and amount revealed clusters suffer some change: except earlier revealed clusters formed new, that consists of objects 5,9,10,14 and 15, referring to the condition Z3. Under much greater threshold value (p = 8) all objects, belonging conditions Z3 and Z1 form a general cluster, but objects, belonging the condition Z2 as before clearly stand out in separate cluster.
Fig.2 Graph-schemes under the different treshold distance p.
Considered strategy of associations in groups is very simple, and demonstrative, but its defects are subjectivity of choice treshold distance, as well as changing a structure cluster when changing an order of its shaping. On our glance more preferred procedures hierarchical agglomerative to categorizations [1 - 6]. Though these procedures are more labour-consuming, their successfully possible realize, using, for instance, computer package "Statistica". Results of using the methods to hierarchical categorizations of data of the table 2 are shown at fig. 4 .
Fig. 4. Results of hierarchical categorizations
Now consider some measures of compactness of the clusters in terms of statistical features of ensemble of points. Coordinates of points, characterized by the condition Z1, are showed in the table 4.
№ / Y1 / Y21 / 18.4 / 10.3
2 / 19.2 / 10.5
3 / 19.3 / 8.7
4 / 19.3 / 9.7
5 / 18.8 / 8.9
Table 4. Coordinates of spots, corresponding condition Z1
Consider a new notion gravity center of the cluster. This is point in multivariate space, which coordinates are average arithmetical values of corresponding coordinates of the cluster points: