Hierarchal Structure of Community and Link Analysis

1

Seema Mishra

Indian Institute of Information Technology, Allahabad
Robotics and AI Lab

G C Nandi Indian Institute of Information Technology, Allahabad
Robotics and AI Lab

1

1

ABSTRACT

Discovering the hierarchy of organizational structure in a dynamic social network can unveil significant patterns which can help in network analysis. In this paper, we formulated a methodology to establish the most influential person in a temporal communication network from the perspective of frequency of interactions which works on hierarchal structure. With the help of frequency of interactions, we have calculated the individual score of each person from PageRankalgorithm. Subsequently, a graph is generated that showed the influence of each individual in the network.Rigorous experiments we performed using Enron data set to establish a fact that our proposed methodology correctly identifies the influential persons over the temporal network. We have used EnronCompany’s email data set that describeshow employees of company interacted with each other. We could analyze from our methodology and verify from the facts in the Company’s dataset since after bankruptcy, the result of interactions and behaviors of the individual of the network are absolutely known. Our result shows that the proposedmethodology is generic and can be applied to other data sets of communication to identify influential at particular instances.

General Terms

Data Mining, Link Mining, Link Analysis, Social Network Analysis, Dynamic Social Network Analysis.

Keywords

Dynamic social network analysis, social network analysis, hierarchical structure.

.

1.INTRODUCTION

A network structure is the perfect epitome provides a formal way of representing data that emphasizes the association between entities. This representation has a substantial importance gives the insight of knowledge into the data. Since for the work to be done many systems these days are interconnected and behaviors of individuals reflect the function of whole system at large extent. Networks are primarily studied in mathematical framework i.e. graph [1].

In modern era, social network analysis is proliferated area of research, has been in existence for quite some time and experiencing a surge in popularity to understand the behavior of the users at individual and group level [Wasserman & Faust, 1994, Wellman, 1996]. Understanding the behavior of individual social networking methods assuage the analysts to revealing hidden patterns from social communication. In order to model the social network mathematically, most popular data structure typically known as graphs are used where the nodes depict the individual or group of person, or event or organization etc and eachlink/edge represents connection/relationship between two individual. Social network analysis attempts to understand the network and its components like nodes (social entities commonly known as actor or event) and connections (inter-connection, ties, and links). It has main focus of analyzing individuals and their interdependent relationships among them rather than individuals and their attributes as we deal in conventional data structure. The target of this paper is to contribute:

a)Proposing a hierarchical structure of social network changing with time

1.1TYPES OF SOCIAL NETWORK

Sociometric: It involves the entire population and focus on global structure pattern of social network.

Egocentric: It focus individual interaction pattern for analyzing social network. Limitation of this kind of analysis is that it is very difficult to collect case by case data.

2.DYNAMIC SOCIAL NETWORK ANALYSIS

Versatile power of social network is being applied to mining pattern of social interaction in wide ranging applications including: disease modeling [4] information transmission and behavior analysis [2, 3, 16] and business management and behavior analysis. Network analysis came in to picture as its practical applications in intelligence and surveillance [5] and has become popularized paradigm to uncover anti-social network such like criminal, terrorist and fraud network majorly after the tragically event of September 11th, 2001 which has shattered the whole world.

Social interaction could be in any form that depends on the type of data available [6]. It might be verbal or written communication (cell phones, emails, and blogs chatting), scientific collaboration (co-authorship network, citation network), browed websites, and group of animals. This mathematical network model is very successful in analysis of social network but major drawback is that it may miss the temporal aspect of interaction because social interaction is inherently dynamic in nature. The static model of interaction can give the information that could be inaccurate and decision made based merely on this contributed information might lead analyst to faulty direction.

Several shortcomings can be highlighted when dealing with static model of social network that could forbid acknowledging the casual relationship of pattern of social interaction [6]:

.

  • What is the rate of spreading diseases while modeling diseases and who is the central person whom should be vaccinated to control spread of it among group of person.
  • What are the causes and consequences of social structure evolution?

Dynamic social network analysis is emerging research area play a crucial role to fill gap between traditional social network analysis and time domain. Dynamic study of network includes classical network analysis, link analysis, and multi-agent systems.

Dynamic network analysis facilitates the analysis of multiple types of nodes (multi-node) and multiple types of links (multi-plex) simultaneously. On the contrary, static network analysis can only focus at most two mode data and analysis one type of link at a time. There are several characteristics of dynamic network:

  • Nature of nodes are dynamic, there properties changes with respect to time.
  • Deals with meta-network.
  • Network evolution is consequent of agent-based modeling.

Furthermore the networkanalysis exists in four levels [15]: Attribute oriented analysis, Position oriented analysis, and Structure oriented analysis, Dynamic network analysis. Fig1 shows the categorization. The attribute analysis captures the properties of vertices and edges and finds the causal relation

with the structure of network.

Fig 1: Level of network analysis

Position oriented analysis aims to investigate the mico level stage of network. It looks into every single entity i.e. node or organization and their characteristic in the network. Structure analysis is macro level analysis of network and investigates the average metrics of social network. Dynamic analysis can utilizes the all measures of aforementioned analysis in order to identify progressive behavior of network and individual.

2.1ROLE OF DATA MINING IN SOCIAL NETWORK

Social computing can make use of data mining techniques in following analysis

  • Community Detection
  • Classification
  • Link Prediction
  • Network Modeling

2.2COMMUNITY IN SOCIAL NETWORK

A community is sub part of whole network between which inter-community interaction is relatively frequent and strong than intra-community interaction. It can be in any form for example group, subgroup, and cluster. It may: a) citation network represents related papers on single topics, b) web pages on related topics.

Community detection is a classical problem in social network analysis. Commonly, it can be the problem of identifying sub-graphs of original graph and called vertex sparsifier[12]. These small networks uphold the relevant information of original group. Four levels of analyses are being conducted in community identification as shown in fig 1:

2.3ANALYSIS OF PREVIOUS RESEARCH

Hierarchical methods for community detection falls into two categories: agglomerative and divisive. In former case each node is assumed to be a community and repetitively group together. Similarly, in later case initially whole network is considered a community and divided subsequently into smaller one. Most methods are graph clustering and partition. Distance based structural equivalence [7] uses distance metrics to identify similar entities. In graph partition methods several algorithms has been proposed [8, 9]. Newman- Girvan method and spectral clustering methods [10, 11] uses a notion of modularity and utilizes edge between-ness metric to divide into groups.

In analyzing dynamic pattern, many methods use the temporal snapshots of interaction over the times [13, 14].

3.DISCOVERING HEIRARCHY OF GROUP

Before analyzing community hierarchy we define several basic terminologies. Hierarchy of community provides the power of each individual in a group. If somehow we know this chain we can find the leader of group. Regarding this we used well known algorithm PageRank which calculate the individual score I_score of each person to represent importancy of person. Higher the score of person more powerful the person is.

Definition: If be the collection of persons involved in communication. For any members and if I_score ()I_score() then can be the leader of .

3.1Proposed Model

Figure2 shows the architecture of system proposed .

First the Individual score of each member calculated with the help of Page Rank algorithm described following.

Input: Social Network G.Output:Individual score of each person Steps:

1.d = 0.85

2.epi = 0.01

3. del = 0,delta[150]

4.del_prev=0

5.float sum1 = 0

6.float sum2 = 0

7.float e = 0

8.do loop:

9.del_prev=del

10.iter++

11.fort i = 1 to 150

12. R[1][i] = ((1 - d) / N) + d*fun(i)

13.for j = 0 to 150

14.sum2 += R[1][j]

15.sum1 += R[0][j]

16.

17.e = sum1 - sum2

18.for j = 1 to 150

19. R[1][j] = R[1][j] + e*L[j]/sum_all

20. For j = 1 to 150

21.sum2 += R[1][j] - R[0][j]

22. del = sum2

23. for j = 1to 150

24.R[0][j] = R[1][j]

25while(epi > del & del_prev!=del)

26.for i=0 to150

27.R[0][i]=10000.0*R[0][i]

In I_score (), d is damping factor generally assumed to be around 0.85. R is vector stores the individual score of each person. N denotes the total number f persons in the network.Sum_all defines the sum of the weight of all edges. Fun( defines where sum of weights of all edges linked with and weights of edges linking and .

4.PRACTICAL IMPLEMENTATION AND ANALYSIS

In this section, we evaluate the capability of the proposed approach on discovering organizational structure and to exploring evolution of organizational structure in a dynamic social network and link between individuals.We performed the experiments on Enron email data set. Email communication data has become a practical source for research in network analysis like social network. Mostly the experiments are carried out on the artificial data due to the non-availability of real life communication data. The Enron email data set [27] has become a benchmark for this sort of research domain in network analysis. This data set was made public and posted on web by the Federal Energy Regulatory Commission during its investigation for fraud happened in company, in order to make it test bed for validating and testing the efficacy of methodologies developed for counter-terrorism, fraud detection and link analysis. Data is about 150 users communication mostly senior managers organized into folder. But this set has still lots of issues like integrity issue and duplicate messages issue. For preprocessing, first the names of distinct users were extracted and duplicated ids were neglected.

The proposed approach is implemented in DEV C++. The experiments were conducted on a 2.1GHz PC with Core(TM)2 Dual- Core Pentium 4 processor with 2 GB RAM.

On examining the results in figure 3, we analyzed the number of grouping in the month of May was maximum. Figure 2 shows that Jim was the person who headed the group from Jan to Feb. Monique was leader throughout the months form Jan to April.

1

Fig 3: Graph over the months columns

1

5.CONCLUSION

In this paper we introduced a concept of hierarchy of positions in group by taking temporal interaction data of twelve months in organization that shows how the position and link of group members changes when people joining and leaving the group. This hierarchal group interaction is significant to facilitate link analysis of individuals with the time period In the future we are planning to improve the integrity issues of preprocessed Enron data results of experiments.

6.REFERENCES

[1]Hanneman, Robert A. and Mark Riddle. Introduction to social network methods. Riverside, CA: University of California (published in digital form 2005.

[2]J. Baumes, M. Goldberg, M. Magdon-Ismail, and W. Wallace. Discovering hidden Inform., 2004

[3]J. Tyler, D. Wilkinson, and B. Huberman. Email asspectroscopy: Automated discovery of community structure within organizations. Proc. 1st Intl. Conf. on Comm. And Tech., 2003.

[4]M. Kretzschmar and M. Morris. Measures of concurrency innetworks and the spread of infectious disease. Math. Biosci.133:165–195, 1996.

[5]J. Baumes, M. Goldberg, M. Magdon-Ismail, and W. Wallace. Discovering hidden groups in communication networks. Proc. 2nd NSF/NIJ Symp. on Intel. and Security Inform., 2004.

[6]T. Y. Berger-Wolf and J. Saia. A framework for analysis of dynamic social networks. In Proc. KDD’06, 523–528, 2006.

[7]Santo Fortunato and Claudio Castellano - Community Structure in Graphs,chapter of Springer’s Encyclopedia of Complexity and System Science (2008).

[8]C. Chekuri, A. Goldberg, D. Karger, M. Levin and C. Stein. Experimental study of minimum cut algorithms. In Proc. 8thSAIM Syposium on Discreet Algorithm, P324-333, 1997.

[9]Andrew Y. Wu, et al., Mining scale-free networks usinggeodesic clustering, In Proc. KDD’04, pages 719-724, 2004.

[10]M.E.J Newman and M. Girvan - Finding and evaluating community structure in networks, Phys. Rev. E 69, 026113 (2004).

[11]M.E.J Newman - Modularity and community structure in networks, PNAS June 6, 2006 vol. 103 no. 23 8577-8582.

[12]A. Moitra. Approximation algorithms for multicommoditytype problems with guarantees independent of the graph size.FOCS, pages 3–12, 2009.

[13]Ding Zhou, Isaac Councill, Hongyuan Zha, C. Lee Giles. Discovering Temporal Communities from Social Network Documents.In Proc. of ICDM’07, pages 745 750, 2007.

[14]C. Tantipathananandh, Tanya Berger-Wolf, David Kempe. A Framework for Community Identification in Dynamic SocialNetworks. In Proc. of KDD’07, pages 717 - 726, 2007.

[15]Mueller, C., Meuthrath, B., Baumgraß, A.: Analyzing Wiki-based networks to improve knowledge processes in organizations. Accepted contribution in J.UCS Journal of Universal Computer Science, 2008.

[16]J. P. Scott. Social Network Analysis: A Handbook. SAGEPublications, January 2000.

1