Social Networking, 2014, 3, 134-141
Published Online February 2014 (http://www.scirp.org/journal/sn)
http://dx.doi.org/10.4236/sn.2014.32017
OPEN ACCESS SN
Studying Group Dynamics through Social Networks
Analysis in a Medical Community
Ruben P. Albuquerque1, Jonice Oliveira1, Fabrício F. Faria1, Rafael Monclar2, Jano M. de Souza2
1Graduate School in Computing Science (PPGI), Universidade Federal do Rio de Janeiro (UFRJ), Rio de Janeiro, Brasil
2Systems and Computing Engineering Graduate School (COPPE), Universidade Federal do Rio de Janeiro (UFRJ),
Rio de Janeiro, Brasil
Email: jonice@dcc.ufrj.br, rrpero@ppgi.ufrj.br, firminodefaria@ppgi.ufrj.br, rastumon@cos.ufrj.br, jano@cos.ufrj.br
Received December 26, 2013; revised 28 January 2014; accepted 19 February 2014
Copyright © 2014 Ruben P. Albuquerque et al. This is an open access artic le distributed under the Creative Commons Attribution Li-
cense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In
accordance of the Creative Commons Attribution License all Copyri ghts © 2014 are reserved for SCIRP and the owner of the intel-
lectual property Ruben P. Albuquerque et al. All Copyright © 2014 are guarded by law and by SCIRP as a guardian.
ABSTRACT
In 2008, the Brazilian Depart ment of S cience and Technology created the INCTs (Brazilian S cience and Tech-
nology Institutes). One of them was the Cancer Control INCT. Due to its impo rtance and considering that there
are different groups working together in the same area, it is important that they collaborate intensely. Envision-
ing an empowerment of scientific collaboration, the BRINCA project was created to support a set of analyses of
the social networks from this particular INCT. These analyses were created by mining curricular and publica-
tions bases, and identifying different types of scient ific relationships and areas. We were able to observe, for in-
stance, how the interaction is amongst researchers from related areas, which researchers were more collabora-
tive and which ones were isolated from the network. These analyzes were used by the INCT coordination to un-
derstand and act to improve scientific collaboration.
KEYWORDS
Social Networks; Scientific Collaborations; Data Mining
1. Introduction
The Brazilian Government created the National Institute
of Science and Technology (INCT) to minimize the divi-
sion and disintegration that exists amongst scientific
groups. The proposal is to join differen t researchers, uni-
versities and research groups of excellence, in Brazil and
abroad. One of these institutes is the Brazilian Institute of
Science and Technology for Cancer Control [1] that is
controlled by the National Institute of Canc er (INCA).
In this scenario, the BRINCA project (Balancing and
Analyses of Scientific Social N etworks in Cancer Con-
trol) was created. The main goals of this project are to
analyze how the Cancer Control INCT members col-
laborate and how the scientific knowledge flows amongst
the different researchers and institutes, and the members
of the group.
An important aspect of our project is the temporal
analyses, understanding the network evolution over the
years, including important research areas and when they
became more relevant. To enable these analyses, a com-
putational environment was built to support the collec-
tion and interpretation of historical data, as well as the
identification of possible problems in group dynamics.
This article consolidates and extends the seminal re-
sults [2] that were presented at the first Brazilian Work-
shop on Social Network Analysis and Mining (Bra-
SNAM), a satellite event of the XXXII Brazilian Com-
puter Society Conference in July of 2012. In this article
we briefly describe the recent works in the field of medi-
cal social networks (Section 2). In Section 3 we detail
our proposal, th e BRINCA project and its current results
in Section 4. In addition, we present related works (Sec-
tion 5) and conclude this work, pointing to some future
work paths (Section 6).
2. Social Network Analysis in Medicine
Social network analysis in medical context is two-fold.
First, it is used to contain disease dissemination, to pre-
R. P. ALBUQUERQUE ET AL.
OPEN ACCESS SN
135
vent it from achieving an endemic or epidemic level.
This can be made through analyzes in the social networks
of those infected and predicting how the disease can
spread [3-5]. The second usage is the identification of
expert networks [6,7], which is the focus of this work.
3. BRINCA Project
The BRINCA Project aims to map the knowledge ex-
changed amongst Cancer Control INCT researchers, as
well as identify how groups develop their research efforts
and how professionals interact with each other. So, this
project aims at the identification of scientific social net-
works, the provision of mechanisms for complex analy-
ses to obtain an improvement in the collaboration amongst
the main specialists.
The reports provided can help to detect weak or strong
points in the interaction between research groups, centres,
and countries, assisting in the guidance of scientific de-
velopment and funding politics [8,9].
In next topics, we describe details of our approach for
the analysis of the INCTCC social network.
3.1. Architecture
The architecture developed for our work has its steps
shown in Figure 1.
The data sources are Lattes [10] and PubMed [11],
which will be presented with more details in Section 3.2.
We use Kettle [12] to orchestrate the extraction, treat-
ment and cleaning routines. The visualization layer is
composed by Gephi [13], Tableau [14] and independent
reports. The metrics were calculated by Gephi [13] and
stored in aData Warehouse. Section 4 details the visuali-
zations and analyses.
3.2. Data Sources
The Lattes Curriculum is a Brazilian nation-wide cur-
ricular database with all the curricula of scientif ic profes-
sionals in Brazil. All of these curricula were downloaded
by XML-Lattes Too l [15] and PubMed data from its own
Web service interface.
After the data extraction, transformation and loading
processes, our data warehouse (multidimensional data-
base) stores different types of relationships between two
researchers over time. The scientific types of relation-
ships are:
Project Participation—being member of a project
team;
Co-authoredtwo p eople work together in a publica-
tion;
Advisory work—a professor supervises a student’s
work;
Examination board participation—professors who
participate in a committee, to judge and evaluate a
thesis;
Judgment commissionsprofessors who participate
in a committee, to judge and evaluate scientific work
as publications (programme committee), project propo-
sals—or evaluate candidates in hiring processes; and
Other types of scientific production (e.g., patents).
In addition to relationships, each one of the researchers
has an individual profile, built w ith one’s personal attrib-
utes, such as: Academic Level (PhD, MSc., or BSc.); Re-
search and activity area; Number of Publications (per
type, such as journals, proceedings, technical reports, …);
Number of Project participations; Number of Thesis Ad-
vice participations; and Number of Participations in Ex-
amination Boards. Research and activity areas indicate
what areas a researcher is connected with. Examples of
research and activity areas are HPV and thyroid cancer.
3.3. Multidimensional Model
All the details of scientific interactions, such as type,
frequency, and members of a social network (and their
profile) are stored in a Data Warehouse, which obeys a
multidimensional model, shown in Figure 2 .
Extra ctor
Extra ctor
Data Sources
Visualization
Metrics for Social Network
Analysis
Layer to A ccess Multidime nsional
Database
Lat te s
PubMed
Figure 1. BRINCA’s architecture.
R. P. ALBUQUERQUE ET AL.
OPEN ACCESS SN
136
Figure 2. Multidimensional model.
This model has a fact table that aggregates the scien-
tific production per year, via an association with the
Time and Scientific Production dimensions. The Scien-
tific Production dimension represents each production
made by one or more researchers (Researcher Dimen-
sion), who can participate in groups (Group Dimension).
The Gro up Dimension is related to Research Groups and
has information on its evaluation and location. All re-
searchers can have one or more expertise areas (i.e. Ge-
netics, Biochemistry, etc.).
Based on this model and using our analysis tools, we
were able to get the results presented in the next section.
4. Current Results
The main issue of this project (described in Section 3) is
to understand the interactions amongst researchers, and
the role of Cancer Control INCT in the promotion of
scientific cooperation in Cancer.
The works developed in the Cancer Control INCT are
classified as per research themes. For each theme, there
are sub-projects [1], which has researchers associated to
them. Project members can be researchers with the INCT,
and also from other (domestic or foreign ) institutions. To
provide the results below we used data from 122 re-
searchers, without introducing the students involved in
the subp r o jects.
One of the analyses points the most connected re-
searchers in the network. A researcher with a high degree
of relationships can be a person with a high lev el of in-
fluence or specific expertise, not always with a supervis-
ing position as department managers or project leaders.
The relationship average network is 8.496. In a big net-
work, it is usual to have subnets. The relationship aver-
age of the most connected nodes in a subnet is 2.667.
Some nodes, with a higher linkage degree, are shown in
Figure 3. Red nodes are department or project heads.
From the 122 researchers, 8 of them are people with
no connection with other INCTCC researchers, although
they have external links. That is, they are nodes discon-
nected from the whole network, as seen in Figure 4,
which shows members and main research area (colour).
However, in Figure 4 two researchers are no t c ounted as
they are not associated with any area, showing only 6
disconnected nodes in Medicine (green node), Veterinary
Medicine (red), Pharmacology (pink), Pharmacy (blue)
and Computing Science (purple).
The network has 8 researchers who act as “bridges”,
connecting large groups. Amongst these 8 researchers, 6
are central nodes (with no higher connection degree).
The metric used was betweenness centr ality. The detection
R. P. ALBUQUERQUE ET AL.
OPEN ACCESS SN
137
of bridges is important for us to verify the weak points of
our network, which can be rendered fragile and could be
easily divided into subgroups if a member, who is a
bridge, leaves the group.
Figure 5 shows a piece of the INCTCC’s social net-
work, where 14 clusters were identified through Modu-
larity metric, but only 3 could be associated with INCTCC
areas. They are: Medicine (20.51% of the researchers),
Collective Health (16.24%), and Genetics (9.4%).
Analyzing internal and external interactions, we iden-
tified 12 researchers (9.84%) with a greater number of
connections with external researchers (who are not mem-
bers of the INCTCC), compared with only 3 (2.46%) that
have strong internal connections with INCTCC research-
ers. Having avery intense connection” or very strong
connection” means a node, whose frequency of interac-
tion with other member exceeds 70% of the highest in-
teraction frequency in entire network, which is of 30.
Any node with more than 21 interactions with another
can be considered as having a very strong relationship
with him/her. We identified 6 researchers (4.91%) with
very strong relationships with other researchers, who act
in different areas.
Since the creation of the INCTCC in January 2009,
with its official implementation in June that year, the
social network changed. Figure 6 show s co-author rela-
tionships in INCTCC pre-creation and during its devel-
opment.
These changes affected the average degree of the net-
work. Analyz ing only the co-authorship relation, we ha ve
the following as average degrees: 1.009 (2007), 0.957
(2008) 0.410 (2009) 0.855 (2010) and 0.171 (2011). We
can see that the number of interactions amongst special-
ists was decreasing, even after the INCT implementation
in 2009. However, a positive difference (increase in
24
23
25
24
30
21
Figure 3. Example for most connected nodes in a subnet.
Figure 4. Disconnected nodes.
R. P. ALBUQUERQUE ET AL.
OPEN ACCESS SN
138
Figure 5. INCTCC network.
Figure 6. INCTCC social network from 2007 to 2011.
relationships) between years 2009-2010, the INCTCC’s
first year of operation, was of 0.445. It was higher than
the difference of the previous year (2008-2009, a growth
of 0.547), which was negative (decreasing of relation-
ships), showing that interactions increased again aft er its
creation. Meanwhile, in 2011 there was a significant de-
crease of this value. It possibly occurred as many publi-
cations were not yet registered with the Lattes Curricu-
lum or were undergoing their review stage in the jour-
nals.
We can see it in Figure 7, which shows the total
co-authorship interactions amongst researchers. There is
an empowerment of relationships from 1993 until before
the INCTCC’s creation. This makes sense, as researchers
knew each other and had constructed ties before the crea-
tion of the Institute.
Following the same line of reasoning, we saw that new
relationships emerged from 2008 through to 2011, show-
ing that the main goal of the INCTCC had being achieved.
Year 2008 saw 54 new co-authorship relations, with 47
in 2009, 79 in 2010, and 25 in 2011. Again we should
remember that when this data was processed, the produc-
tion of 2011 was not 100% complete. Even with the de-
crease in the total number of relationships, the number of
new relations remained almost constant, increasing in
2010.
We analyzed the number of publications over years, as
shown in Figure 8. There is a 12% fall from 2008 to
R. P. ALBUQUERQUE ET AL.
OPEN ACCESS SN
139
0
100
Year
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
20 11
Numb e r of Re lationsh ips
200
50
150
Figure 7. Total number of co-authorship relations over the years.
0
200
400
600
800
1000
1200
Year
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
20 11
Total of pu b lications
Figure 8. Total publications over the years.
2009, but the decrease in 2010, when compared to 2008
is of 29%. Probably after the INCTCC creation, re-
searchers focused their research work on INCT areas.
Another possible explanation is the increase in the new
number of relationships. It is natural, when you start new
professional interactions, that there is a period of adjust-
ment. This adaptation process involves learning about
one’s new partner’s works, understa nding new processes
and methods, and also the achievement of research ma-
turity toward s the obtaining of results. This adaptation
consumes time, a nd fewer results are expected worthy of
publication.
Figure 9 shows an example of interactions based on
areas of common interest and expertise. The image
shows interactions in the area of Collective Health. The
edge’s thickness indicates the number of interactions
between two people.
With the developed environment, based on a multidi-
mensional model, we can undertake several analyses.
This project allows us to have a clear view of research
group behaviour and identify key problems in scientific
collaboration.
5. Related Work
The most similar work is from [7], who se focus is to
analyze the social networks of researchers in the field of
parasitic diseases such as dengue fever, Chagas disease
and malaria, for example. Co-authorship was used to
infer relationships amongst researchers in a particular
area. Based on keywords, extracted from title of articles,
the authors identified clusters. However, the difference to
our study is the use of a higher number of datasets, as
Lattes and PubMed, and the identification of different
types of scientific relationships (not only co-authorship).
Furthermore, we automated all the data treatment proc ess.
We also identify relationships amongst groups (not only
amongst people), as example, institutions and funding
agencies funding and can visualize all the interactions in
a specific area.
The work presented by [16] tried to find patterns of
interaction amongst researchers in the field of tourism in
regions of Australia and New Zealand. For this, he used
bibliometric information in the 1999-2005 period to ana-
lyze co-authorship networks, inter-institutional collabo-
rations, and international collabora tions. The similarity to
our work is related to the use of metrics of social net-
works and some types of networks which were used.
However, limitations in terms of viewing them compro-
mise their final result. The fact that we created a visuali-
zation approach based on Gephi and Tableau helped us
R. P. ALBUQUERQUE ET AL.
OPEN ACCESS SN
140
Figure 9. Interactions in the area of collective health.
significantly, as it provided more flexibility to configure
metrics and parameters.
The research conducted by [17] is very close to ours,
but their focus is on Web Science researchers. They do
not use a multidimensional analysis to deal with data,
and do not identify different kinds of relationships, ei-
ther.
We can also mentio n the work of [18], who brought
the concept of balancing and it was the foundation for the
idea presented here. The main difference between the
two work, Monclar’s focused on a small community of
the Department of Systems Engineering and Computing
at COPPE/UFRJ, while this work focuses on all the
Cancer Control INCT researchers. The Monclar et al
work [2009] did not use a multidimensional database and
therefore did not have the benefits of multidimensional
analyses.
We also identified the work of [19]. It is a study on the
behaviour of collaborative production, but focused on
open-source software projects. The data used to identify
relationships was obtained with the analysis of source
code, discussion forums, chats, and version updates. The
metrics were calculated to determine the structural char-
acteristics (degree, centrality, etc.) and topological (den-
sity, diameter distribution, etc.) of social networks as
well as in our work. Th e difference is that the focus was
not to improve the network, but only to identify it.
In the study by [20], we see a method to detect, iden-
tify and visualize research groups in an university. The
method is quite simple, relying on the generation of a
matrix that lists the authors of the articles and it can infer
a so cial network of co-authors. The visualization itself is
quite clear and simple to understand, although it is not
concerned with temporal analyses.
6. Conclusions and Future Work
Social network analysis helps to understand group de-
velopment and to identify relationships patterns. This
kind of analysis has been used in many situations,
amongst which the health care scenario.
In this article, we presented the BRINCA project,
whose goal is to support the analysis and visualization of
scientific social networks. This project was applied in the
area of Cancer Control in Brazil, in the scenario of the
National Science and Technology Institutes (INCTs).
The INCT is a mechanism to motivate collaboration
amongst universities and research institutions dealing
with strategic questions, in our case, cancer control.
This project is still under development and we can
mention some future works and improvements. One of
them is enriching the analysis, inputting data from medi-
cal records and cancer treatments. So, we can compare
and identify the interaction amongst clinical treatment
and research.
Another challenge is the adoption of data mining tech-
niques to detect associative rules and recurrence patterns.
As last work, we will study the benefits of the ap-
proach created for the research scenario for cancer con-
trol in Brazil.
Acknowledgements
We would like to thank CNPq, CAPES, and FAPERJ for
their support, specially by the support provided by the pro-
jects “INCT para Controle do Câncer” (CNPq 573806/
2008-0 e FAPERJ E26/170.026/2008) andProjeto Uni-
versal: CLOTO: Composição, Mineração, Análise e Pre-
dição de Redes Sociais Utilizando Dados Ligados Aber-
tos e Contextualizado” (CNPq 487239/2012-1), by the
R. P. ALBUQUERQUE ET AL.
OPEN ACCESS SN
141
grants “Jovem Cientista do Nosso Estado” (Young Re-
searcher of Rio de Janeiro, FAPERJ: E_23/2013) and
“Produtividadeem Pesquisa-Nível 2” (Productivity in Re-
search-Level 2, CNPq: 308219/2010-4).
REFERENCES
[1] INCTCC, “INCT Activity Report 2010—Home-INCA,
2010.
http://www1.inca.gov.br/inca/Arquivos/INCT/inct_projec
t_2010.pdf
[2] R. A. Perorazio, F. F. Faria, R. Monclar, J. Oliveira and J.
Souza, “Estudando Di nâmicas de Grupo Através da Utili-
zação da Análise de Redes Sociais em uma Comunidade
Médica,Proceedings of the Brazilian Workshop on So-
cial Network Analysisand Mining, XXXII Congress of the
Brazilian Computer Society, Curitiba, 2012.
[3] J. C. Cordeiro, “Redes Sociais e Saúde,” Revista Hispana
para Elanálisis de Redes Sociales, Vol. 12, No. 10, 2007.
[4] A. S. Klovdahl, “Social Networks and the Spread of In-
fectious Diseases: The AIDS Example,” Social Science &
Medicine, Vol. 21, No. 11, 1985, pp. 1203-1216.
http://dx.doi.org/10.1016/0277-9536(85)90269-2
[5] M. Negreiros, et al., “Optimization Models, Statistical
and DSS Tools for Dengue Prevention and Combat,” Ef-
ficient Decision Support Systems: Practice and Chal-
lenges in Biomedical Related Domain, INTECH Opena
Access Publisher, Vol. 1, 2011, pp. 115-160.
[6] R. S. Monclar, “Análise e Balanceamento de Redes So-
ciais no Contexto Científico,” M.Sc. Thesis, COPPE/
PESC, Universidade Federal do Rio de Janeiro, 2008.
[7] C. M. Morel, S. J. Serruya, G. O. Penna and R. Gui-
maraes, “Co-Authorship Network Analysis: A Powerful
Tool for Strategic Planning of Research, Development
and Capacity Building Programmes on Neglected Dis-
eases,” PLoS Neglected Tropical Diseases, Vol. 3, No. 8,
2009. http://dx.doi.org/10.1371/journal.pntd.0000501
[8] A. Parent, F. Bertrand, G. Côté, et al., “Scientometric
Study on Collaboration between India and Canada, 1990-
2001,” 2003.
http://www.science-metrix.com/pdf/SM_2003_009_DFA
IT_Indo-Canadian_S&T_Collaboration.pdf
[9] J. Owen-Smith, M. Riccaboni, F. Pammolli, et al., “A
Comparison of U.S. and European University-Industry
Relations in the Life Sciences,” Management Science,
Vol. 48, No. 1, 2002, pp. 24-42.
http://dx.doi.org/10.1287/mnsc.48.1.24.14275
[10] Lattes, “Lattes,” 2012. http://lattes.cnpq.br/
[11] Pubmed, “PubMed Home,” 2012.
http://www.ncbi.nlm.nih.gov/pubmed
[12] Pentaho, “Pentaho Kettle Project,” 2012.
http://kettle.pentaho.com
[13] Gephi, “Gephi,” 2012. http://gephi.org/
[14] Tableau, “Tableau Software,” 2012.
http://www.tableausoftware.com/
[15] G. O. Fernandes, J. Oliveira and J. M. Souza, “XMLattes
A Tool for Importing and Exporting Curricula Data,” In-
ternational Conference on Information and Knowledge
Engineering, Las Vegas, 2011.
[16] P. Benckendorff, “Exploring the Limits of Tourism Re-
search Collaboration: A Social Network Analysis of Co-
Authorship Patterns in Australian and New Zealand Tour-
ism Research,” 20th Annual CAUTHE Conference, Ho-
bart, Australia, 2010.
[17] A. H. F. Laender, et al., “Building a Research Social Net-
work from an Individual Perspective,” ACM/IEEE Joint
Conference on Digital Libraries, Ottawa, 2011, pp. 427-
428.
[18] R. S. Monclar, J. Oliveira and J. M. Souza, “Analysis an d
Balancing of Social Network to Improve the Knowledge
Flow on Multidisciplinary Teams,” 13th International Con-
ference on Computer Supported Cooperative Work in De-
sign, Santiago, Chile, 2009.
[19] S. F. De Sousa, M. A. Balieiro and C. R. B. de Souza,
“Análise Multidimensional de Redes Sociais de Projetos
de Software Livre,” Proceedings of the 2008 Simpósio
Brasileiro de Sistemas Colaborativos, Vila Velha, 27-29
October 2008, pp. 23-33.
http://dx.doi.org/10.1109/SBSC.2008.35
[20] A. Perianes-Rodriguez, C. Olmeda-Gómez and F. Moya-
Anegón, “Detecting, Identifying and Visualizing Re-
search Groups in Co-Authorship Networks,Scientomet-
rics, Vol. 82, No. 2, 2010, pp. 307-319.
http://dx.doi.org/10.1007/s11192-009-0040-z