Clique Approach for Networks: Applications for Coauthorship Networks

doi:10.4236/sn.2014.32010

Paper Menu >>

Journal Menu >>

Social Networking, 2014, 3, 80-85

Published Online February 2014 (http://www.scirp.org/journal/sn)

http://dx.doi.org/10.4236/sn.2014.32010

OPEN ACCESS SN

Clique Approach for Networks: Applications for

Coauthorship Networks

Marcos Grilo Rosa1, Inácio de Sousa Fadigas1, Maria Teresinha Tamanini Andrade2,

Hernane Borges d e B arros Pereira3,4

1Universidade Estadual de Feira de Santana, Feira de Santana, Brazil

2Instituto Federal de Educação Ciência e Tecnologia, Simões Filho, Brazil

3Universidade do Estado da Bahia, Salvador, Brazil

4Programa de Modelagem Computacional, SENAI Cimatec, Salvador, Brazil

Email: gri l o@uefs.br, fadigas@uefs.br, tamanini@ifba.edu.br, hbbpereira@gmail.com

Received November 22, 2013; revised December 28, 2013; accepted February 6, 2014

cense, which permits unrestricted use, distribution, and reproduction in any medium, provide d the original work is properly cited. In

ABSTRACT

Coauthorship networks consist of links among groups of mutually connected authors that form a clique. Classic-

al approaches using Social Network Analysis indices do not account for this characteristic. We propose two new

cohesion indices based on a clique approach, and we redefine the network density using an index of variance of

density. We have applied these indices to two coauthorship networks, one comprising researchers that published

in Mathematics Education journals and the other comprising researchers from a Computational Modeling

Graduate Program. A contextualized and comparative analysis was performed to show the applicability and po-

tential of the indices for analyzing social networks data.

KEYWORDS

Clique Networks; Cohesion Indices; Coauthorship

1. Introduction

Coauthorship networks are an example of social net-

works in which two authors are linked if they have writ-

ten an article together. Due to the low number of authors,

which is generally less than ten except for projects from

large research groups, coauthorship networks represent a

significant social involvement between authors. Because

they comprise mutually connected groups (groups of an

article’s authors), these networks can be modeled using

structures from graph theory known as cliques. Suppose

there is a graph G = (V,

) with two sets: V, made of

vertices (or nodes), and

, consisting of elements called

edges with one or two vertices connected to each [1]. A

subgraph with a cliqu e structure is the maximal sub set of

mutually adjacent vertices in G. For simple graphs

(without loops or multiple edges), such as the graphs

used in Social Network Analysis, a subgraph originating

from a clique is a complete graph by definition.

Several aspects of coauthorship networks have been

studied. For example, Katz and Martin [2] showed that

using coauthorship to evaluate scientific collaboration is

advantageous because, in addition to being invariant and

verifiable, it is a relatively practical and inexp ensive me-

thod. Similarly, although Vanz and Stump [3] distin-

guished between collaboration and coauthorship, desig-

nating coauthorship as one facet of scientific collabora-

tion, this distinction has not hindered the use of coau-

thorship to assess collaboration, especially in bibliome-

trics and scientometrics. In the broader field of Social

Network Analysis, where collaborations are treated as

complex networks, Newman [4-6] studied the structure

of scientific collaboration networks and found evidence

for the small-world phenomenon. Maia and Caregnato [7]

used degree centrality, betweenness centrality and close-

ness centrality to analyze a coauthorship network of pr o-

fessors in the Epidemiology graduate program at the

Federal University of Pelotas (Universidade Federal de

Pelotas—UFPel). Mello, Crubellate and Rossoni [8],

M. G. ROSA ET AL.

OPEN ACCESS SN

used density, number of components and centrality in

coauthorship networks constructed using data extracted

from the Lattes Platform to measure the level of collabo-

ration in Administration graduate programs (sensu stric-

to). These studies used classical Social Network Analysis

indices, es pecially centrality, to emphasize authors in the

network.

On the other hand, there are some papers on clique and

line graphs [9-11] that deal with the detection of com-

munities or clusters. Evans [9] provided a method of

community detection in networks based on line graphs.

The proposed clique and line graphs are weighted graphs

and of fixed order. Each vertex of the original graph is a

clique and network analysis centers on the edges rather

than the vertices as in the classical approach of Social

Network Analysis. Evans applied their method on several

networks of which we highlight co-authorship a network

and a network of teams of a football league.

This article, however, seeks to redefine the classical

concepts of density and show new cohesion indices for

networks comprising cliques, as defined by Fadigas and

Pereira [12], which will then be applied and interpreted

using co-authorship networks. Thus, we intend to inves-

tigate relationships between cliqu es and not just relations

between vertices (i.e. actors). The main cohesion index

used to characterize a network, i.e., density, is paramete-

rized by two extreme cases of network topology: a net-

work containing only vertices (density of 0) and a net-

work where all the vertices are mutually connected (cli-

que), i.e., a network with a density of 1. In networks such

as the ones involved in coauthorship, a topology with a

null density does not exist by definition (authors who

publish alone do not form coauthorships). At the other

extreme, it would be difficult for real coauthorship net-

works to exhibit a density of one, as it would imply that

all of the authors have published in collaboration with all

of the other authors in the network at least once.

2. Clique Approach for Networks

In this section, we present an original approach for ana-

lyzing networks whose basic elements are cliques, pro-

posing new cohesion indices and redefining classical

indices [12]. The approach is based on the notion that

initially isolated cliques form networks by juxtaposition

and/or superposition. Fadigas and Pereira [12] define

juxtaposition as th e proces s where two cliques are linked

by a single common vertex. The authors denote processes

where cliques are linked by two or more common vertic-

es by superposition.

Figure 1 shows an initial configuration of discon-

nected cliques, b efor e the juxtaposition and/or superposi-

tion process. Figure 2 shows an example of a network

formed by applying the processes of juxtaposition and/or

Figure 1. Initial configuration of disconnected cliques.

Figure 2. Formation of a clique network by juxtaposition

and superposition.

superposition to the initial configuration shown in Figure

1. The networks that result from these processes are

called clique networks, in reference to the basic compo-

nents that may or may not be connected.

3. Cohesion Index Based on a Clique

Approach

From the initial configuration of disconnected cliques

and the processes of juxtaposition and/or superposition

that create the clique network, Fadigas and Pereira [12]

propose two novel cohesion indices: normalized density

and variance of density. In this article, three additional

cohesion indices are introduced: edge superposition, ver-

tex reduction factor and component reduction factor.

3.1. Normalized Density and Variance of Densit y

One of the main cohesion indices for Social Network

Analysis is the density

( )

∆

for undirected networks

with n vertices, which relates the number of edges in the

network

()

to the maximum possible number of

edges, given by (

( )

12nn−

). The density

( )

∆

of a

network is an index that varies from 0 to 1. When

0∆=

the network is totally disconnected and does not ade-

quately reflect the initial configuration of disconnected

cliques (Figure 1). The density of the initial configura-

tion of disconnected cliques can be calculated using the

following expression:

M. G. ROSA ET AL.

OPEN ACCESS SN

( )

000

∆= −

(1)

where n0 is the number of vertices in the initial configu-

ration of disconnected cliques. Fadigas and Pereira [12]

proposed a more appropriate normalization for density in

clique networks, denoted as the normalized density

( )

norm

∆

norm q

∆−∆

∆=

−∆

(2)

when

norm

∆=

, the actual network is equivalent to the

initial configuration of disconnected cliques. Fadigas and

Pereira [12] also proposed the variance of density

( )

()v∆

which measures the densification of the network com-

pared to the initial configuration of disconnected cliques

and can be calculated by the following expression:

( )

[]

εεε

∆≅× −=−





(3)

3.2. Edge Superposition

The variance of density depends on both the relationship

between the edges and the relationship between the ver-

tices. Therefore, it does not directly quantify clique su-

perposition.  is a superposition rate that compares the

number of edges in the initial configuration of discon-

nected cliques with the number of edges after juxtaposi-

tion and/or superposition. This rate is defined by Equa-

tion (4) and is applicable to both connected and discon-

nected networks.

0 max

εε

−

=−

(4)

In the above equation,

max

is the number of edge s in

the largest clique present in the initial configuration of

disconnected cliques. A superposition value of 0 occurs

when

εε

and, therefo re, there is no superposition of

the edges, although juxtaposition may (or may not) occur.

A superposition value of 1 occurs when

max

εε

, i.e.,

when the network comprises only the largest clique.

3.3. Vertex Reduction Factor

In parallel to

for the superposition of edges, (RV) is

a factor that measures the vertex juxtaposition rate and

can be used to compare the number of vertices in the

initial configuration of disconnected cliques with the

number of vertices in the network resulting from juxta-

position and/or superposition. The difference between the

number of vertices in the initial configuration of discon-

nected cliques and the number of vertices resulting from

the juxtaposition and/or superposition processes shows

how many vertices are shared between the two cliques,

and it can be parameterized by the number of vertices in

the resulting network. Symbolically, again denoting the

number of vertices in the initial configuration of discon-

nected cliques by n0, the number of vertices in the largest

clique of the initial configuration of disconnected cliques

by nmax and the number of vertices in the network after

the juxtaposition and/or superposition processes by n, we

obtain

0 max

Rnn

−

=−

(5)

The index varies from 0 to 1. A null value occurs

when n = n0, i.e., when there is no juxtaposition or su-

perposition. The maximum value, 1, occurs when the

network comprises only the largest clique in the initial

configura tion.

3.4. Component Reduction Factor

We observed in Section 3.1 that the number of cliques is

represented by nq in the initial clique configuration. We

consider this number to be the number of “components”

in the initial configuration. Thus, it is also possible to

quantify the reduction in the number of components as a

measure of the network cohesion. Therefore, the compo-

nent reduction (RC) measures how often cliques from the

initial configuration are connected to form larger com-

ponents, consequently reducing their number. It is nor-

malized in the same way as the previous indices, using

the minimum number of components that can result as

the base, which is 1. Considering that nq can be defined

as the number of components in the initial configuration

of disconnected cliques, let CC be the number of compo-

nents in the network. Thus, the factor can be expressed as

follows

−

=−

(6)

The values for RC vary from 0 to 1. A value of 0 oc-

curs when there is no juxtaposition or superposition.

Conversely, a value of 1 results when the network is

connected.

4. Application to Coauthorship Networks

After defining the indices using this new approach, we

chose two coauthorship networks to calculate and interp-

ret the indices. One of the networks consists of authors

who published in six Mathematics journals, which we

grouped under the name of Mathematics Education jour-

nals. The other network consists of researchers in a

computational modeling graduate program.

The coauthorship networks were constructed so that

each group of coauthors is mutually connected, and each

M. G. ROSA ET AL.

OPEN ACCESS SN

group is connected to another if they have at least one

author in common. To interpret coauthorship as a type of

collaboration, we excluded from the network authors

who only published alone. We should note that the ME

and GP networks have 1000 and 795 vertices and 572

and 11 components, respectively, before authors who

published alone were excluded. Figures 3(a) and (b)

show the networks, and Table 1 summarizes some prop-

erties for the networks .

5. Results and Interpretation

As the indices from our clique approach were applied to

(a)

(b)

Figure 3. Coauthorship networks: (a) ME Network, where

the thickness of the edges is proportional to the number of

studies published by the corresponding pair of vertices; (b)

GP Network, with the thickness of the edges calculated as

described above. (a) ME Network; (b) GP Network.

Table 1. Basic properties of the ME and GP networks.

Properties ME network GP network

Number of vertices (n) 588 792

Number of authors 1751 3234

Author/vertex ratio 2.9779 4.083

Components 160 8

Largest component (%) 7.14 90.28

Coauthored publications 944 1030

Mean clique size 1.85 2.97

only two networks, we performed an interpretive analysis

of the new indices proposed here and compared the two.

Table 1 shows that the ME and GP networks possess

numbers of vertices of the same order of magnitude (588

and 792, respectively). The same is true for the number

of coauthored publications (944 and 1030, respectively).

However, the number of authors in the GP Network,

represented by the number of vertices in the initial con-

figuration of the discon nec ted cliques , is almost twice the

number of authors in the ME Network. One significant

difference between the networks is the relative size of the

largest component, i.e., the largest group of intercon-

nected authors in the network. For the ME Network, the

largest group contains only approximately 7% of the au-

thors; however, this number is approximately 90% in the

GP Network. This difference is due to th e distinct nature

of the two networks: while the ME Network contains

authors who publish in six distinct journals, the GP Net-

work comprises only researchers who are part of the

same graduate program. However, despite the distinct

nature of the Mathematics Education journals, they form

the corpus of publications on the field, and the relatively

small number of authors in a single group allows little

collaboration, in terms of coauthorship.

The cohesion indices calculated for the two networks

are shown in Table 2. The density (binary) is calculated

without accounting for the number of times that a pair of

authors published together, while the valued density does

account for repeated joint publications.

The ratio between the two numbers provides the mean

number of publications per pair and reflects how often

authors publish together. Table 2 shows that, in the ME

Network, each pair published once, on average, while the

mean is nearly two in the GP Network.

The variance of density measures the “densification”

of the network, i.e., how much the groups of coauthors

coalesce, either in isolation or in pairs, when the network

is formed. The index measures the variance of the vertic-

es and the edges simultaneously, compared to the initial

Table 2. Cohesion indices using a clique a pproach.

Properties ME network GP network

Density (binary)

( )

∆

0.0037 0.0073

Density (valued)

( )

valued

∆

0.0040 0.0135

valued

∆∆

1.1029 1.8554

Normalized density

( )

norm

∆

0.0036 0.0127

Variance of density

( )

v∆

7.8784 15.4440

Edge superposit ion

( )

0.0972 0.5189

Vertex reduction factor

( )

0.6672 0.7857

Component reduction factor

( )

0.8736 0.9936

M. G. ROSA ET AL.

OPEN ACCESS SN

configuration of disconnected cliques. The results in

Table 2 show that the GP Network displays a higher

variance of density index score higher than the ME Net-

work. The theoretical maximum value for the variance of

density occur s when the network becomes a single clique,

i.e., all of the vertices are mutually connected. For the

ME Network, this maximum value is approximately

2200, and thus the value found herein (7.8784) corres-

ponds to only 0.4% of the maximum. In the case of the

GP Network, the maximum variance of the density is

1223, and the measured value (15.4440) is 1.3% of the

maximum. Comparatively, it can be stated that the GP

Network is more than threefold “densified” compared to

the ME Network. The variance of density index reflects

the coalescence of the authors that published alone but

also as coauthors (reduction of vertices without a reduc-

tion in the number of edges) in relation to the coales-

cence of pairs that published as coauthors (simultaneous

reduction of vertices and edges). To more precisely de-

termine what type of situation predominates in the net-

work, the superposition of edges and the vertex reduction

factor can be used.

The superposition of edges determines the proportion

of coauthor pairs in common; i.e., it is an index that

measures relationships, represented by the edges in the

network. The superposition of approximately 10% in the

ME Network indicates that few pairs of authors pub-

lished together more than once, unlike the GP Network,

where 52% of the pairs have more than one publication

together. This index, therefore, shows that there is more

scientific production by pairs of authors in the GP Net-

work than in the ME Network.

The vertex reduction factor, in turn, is directly related

to the author/vertex ratio in Ta bl e 1 . This index indicates

the percentage of authors with more than one publication.

The values obtained for the ME and GP networks indi-

cate that the researchers in the GP network display higher

individual productivity.

The values of the component reduction factor for the

two networks do not differ. Although the ME Network

has 160 components compared to only 08 for the GP

Network , the percentage difference is approximately

12%. However, almost all of the coauthorship groups

shared at least one vertex in common (link) in the GP

Network (99.4%), while a value of 87.4% was observed

in the ME Network.

6. Final Considerations

The clique approach in coauthorship networks allows the

social data to be analyzed in a way that is well suited for

the network topological structure. Network analysis us-

ing cohesion indices already allows new interpretations.

For example, considering the index that measures edge

superposition together with the vertex reduction factor

allowing us to clarify h ow the juxtaposition an d superpo-

sition processes create the network. Thus, we observed

that superposition predominated in the GP Network

compared to the ME Network. This effect also occurred

with the vertex reduction factor, but to a lesser extent.

These aspects result in a greater “densification” of the

GP Network, mostly due to the large number of pairs of

authors who have written more than one study together.

These results are consistent with the fact that the GP

Network comprises researchers connected through the

same research institution, while the ME Network in-

cludes researchers who may have stronger ties within

their own groups, but this collaboration is not shown

through their publication in jou rn a ls of the field .

The initial research us ing cohes ion indices showed that

other indices could potentially be added, and the dynam-

ics of network growth could be evaluated. Another aspect

that we emphasize is the applicability of the clique ap-

proach to other social networks with a similar structure,

such as actor-movie networks.

Finally, it is important to comment that th is work is an

ongoing research and initially it was published in the

proceedings of the 1st Brazilian Workshop on Social

Network Analysis and Mi ning [13].

REFERENCES

[1] J. L. Gross and J. Yellen, “Graph Theory and Its Applica-

tions. Discret Mathematics and Its Applications,” CRC

Press, Boca Raton, 2003.

[2] J. S. Katz and B. R. Martin, “What Is Research Collabo-

ration?” Research Policy, Vol. 26, N o . 1 , 1997, pp. 1-18.

http://dx.doi.org/10.1016/S0048-7333(96)00917-1

[3] S. A. d. S. Vanz and I. R. C. Stump, “Scientific Collabo-

ration: A Theoretical-Conceptual Review,” Perspectivas

em Ciência da Informação, Vol. 15, No. 2, 2010, pp. 42-

55. http://dx.doi.org/10.1590/S1413-99362010000200004

[4] M. E. J. Newman, “The Structure of Scientific Collabora-

tion Networks,” Proceedings of the National Academy of

Sciences, Vol. 98, No. 2, 2001, pp. 404-409.

http://dx.doi.org/10.1073/pnas.98.2.404

[5] M. E. J. Newman, “Scientific Collaboration Networks. I.

Network Construction and Fundamental Results,” Physi-

cal Review E, Vol. 64, No. 1, 2001, Article ID: 016131.

[6] M. E. J. Newman, “Scientific Collaboration Networks. II.

Shortest Paths, Weighted Networks, and Cen trality,” Phy-

sical Review E , Vol. 64, No. 1, 2001, Article I D: 016132.

[7] M. F. Maia and S. E. Caregnato, “Co-Authorship as an

Indicator of Scientific Collaboration Network,” Perspec-

tivas em Ciência da Informação, Vol. 13, No. 2, 2008, pp.

18-31.

[8] C. M. Mello, J. M. Crubellate and L. Rossoni, “Coauthor

Networks between Brazilian Graduate Administration

Program Faculty (Strictu Sensu): Structural and Dynamic

Aspects of Relationships,” Perspectivas em Ciência da

Informação, Vol. 15, No. 2, 2009, pp. 42-55.

M. G. ROSA ET AL.

OPEN ACCESS SN

[9] T. S. Evans, “Clique Graphs and Overlapping Communi-

ties,” Journal of Statistical Mechanics: Theory and Expe-

riment, Vol. 2010, 2010, Article ID: P12037.

http://dx.doi.org/10.1088/1742-5468/2010/12/P12037

[10] T. S. Evans and R. Lambiotte, “Line Graphs of Weighted

Networks for Overlapping Communities,” The European

Physical Journal B, Vol. 77, No. 2, 2010, pp. 265-272.

http://dx.doi.org/10.1140/epjb/e2010-00261-8

[11] T. S. Evans and R. Lambiotte, “Line Graphs, Link Parti-

tions, and Overlapping Communities,” Physical Review E,

Vol. 80, No. 1, 2009, Article ID: 016105.

http://dx.doi.org/10.1103/PhysRevE.80.016105

[12] I. S. Fadigas a nd H. B. B. Pereira, “A Network Approach

Based on Cliques,” Physica A: Statistical Mechanics and

its Applications, Vol. 392, No. 10, 2013, pp. 2576-2587.

[13] M. G. Rosa, I. S. Fadigas, M. T. T. Andrade and H. B. B.

Pereira, “Clique Approach for Networks: Applications for

Coauthorship Networks,” Proceedings of the Brazilian

Workshop on Social Network Analysis and Mining, XXXII

Congress of the Brazilian Computer Society Computer

Society, Curitiba, 2012.