Inf 723 Information & Computing (Fall 2008)

Jagdish S. Gangolly

Entropy of dialogues creates coherent structures in e-mail traffic

(Jean-Pierre Eckmann, Elisha Moses, and Danilo Sergi, October 5, 2004 vol. 101 no. 40 14333–14337 )

Until recently, most graph-theoretic studies of social interaction have been based on a static structure for the graphs. Recently, studies have used innovative metaphors from Physics, Biology, and in general non-linear dynamical systems, to study the dynamics of the clustering of individuals in social networks. Theis paper is a prime example. The bibliography at the end of the paper lists a few other interesting articles.

Clustering of individuals in small networks is not static in nature, and the clustering changes depending on the needs of organisations. Such clusterings are usually contextual, thematic, and evolve into coherent self-organised structures.

One way to analyse the evolution of such structures is to study the synchronization of messages between individuals.

The metaphor used is one of brain where communication of information between neurons takes place through exchange of spikes (also called action potentials), which are rapid change of polarity of voltage from positive to negative and vice versa (

The experiment reported in the paper deals with over 2 million emails exchanged between about 10,000 users over 83 days. Looking at the organization in isolation, considering only communications within the organization, the authors isolate for further study 309,125 emails between 3,188 users in the organization.

Mass mailings do not contribute to interaction because they do not usually elicit a response. Therefore, removing mass mailings (to: field to more than 18 recipients) from the data, there remain 202,695 emails.

Considering co-link (or congruent nodes) as a fundamental concept of friendly interaction, the basic building block of social interaction is a triangle.

On the web, transitivity of congruences (if node A is congruent with both nodes B and C, one expects nodes B and C also to be congruent) is common. However, that is not usually so in case of emails.

Model:

The data extracted from the email logs include from & to, fields and the timestamps of messages. The message content is ignored, and so is the subject line and the size of the message.

On the timeline, communications between any two nodes A and B are represented by spikes (upward ticks for communications from A and B, and downward ticks for communications from B to A). t represents the time lapse between the communication from A to B and the response from B to A.

The response time delay probability distribution (Figure 2) shows a peak between 16 and 24 hours mainly because of the workday consisting of 8 hours (followed by 16 hours of rest). This peak disappears when the probability is measured in ticks (units of messages sent in the system.

Notation:

d:number of days (83)

PA(i), i = {0,1}: Probability of an email even from A on a given day = NA(i)/d

PAB(i,j), i = {0,1}: Probability of emails both ways between A and B on a given day = NAB(i,j)/d

Influence of A on B (mutual information):

Next, for each possible triangle, probability measures similar to the above are defined and computed. A measure of temporal coherence (IT) is defined as mutual information (Kullback-Liebler divergence) is defined and computed.

Next, a cut-off temporal coherence (IT0.5) is used to construct a conjugate graph of the original graph where each node is a triangle that exceeds the temporal coherence threshold and two such nodes are connected by a link if they have a common edge (ie., two triangle have a common edge).

Figure 4 shows the departmental organization, and Figure 5 shows the conjugate graph.