Journal of Intelligent Learning Systems and Applications, 2012, 4, 319-329 Published Online November 2012 (
Behind HumanBoost: Analysis of Users’ Trust Decision
Patterns for Identifying Fraudulent Websites
Daisuke Miyamoto1, Hiroaki Hazeyama2, Youki Kadobayashi2, Takeshi Takahashi3
1Information Technology Center, The University of Tokyo, Tokyo, Japan; 2Internet Engineering Laboratory, Graduate School of
Information Science, Nara Advanced Institute of Science and Technology, Nara, Japan; 3Security Architecture Laboratory, National
Institute of Information and Communications Technology, Tokyo, Japan.
Received May 29th, 2012; revised October 10th, 2012; accepted October 17th, 2012
This paper analyzes users’ trust decision patterns for detecting phishing sites. Our previous work proposed HumanBoost
[1] which improves the accuracy of detecting phishing sites by using users’ Past Trust Decisions (PTDs). Web users are
generally required to make trust decisions whenever their personal information is requested by a website. Human-
Boostassumed that a database of Web user’s PTD would be transformed into a binary vector, representing phishing or
not-phishing, and the binary vector can be used for detecting phishing sites, similar to the existing heuristics. Here, this
paper explores the types of the users whose PTDs are useful by running a subject experiment, where 309 participants-
browsed 40 websites, judged whether the site appeared to be a phishing site, and described the criterion while assessing
the credibility of the site. Based on the result of the experiment, this paper classifies the participants into eight groups
by clustering approach and evaluates the detection accuracy for each group. It then clarifies the types of the users who
can make suitable trust decisions for HumanBoost.
Keywords: Detection of Phishing Sites; Trust Decision; Credibility of Websites; Machine Learning; Cluster Analysis
1. Introduction
Phishing is a form of identity theft in which the targets
are users rather than computer systems. A phishing at-
tacker attracts victims to a spoofed website, a so-called
phishing site, and attempts to persuade them to provide
their personal information.
To deal with phishing attacks, a heuristics-based de-
tection method has begun to garner attention. A heuristic
is an algorithm to identify phishing sites based on users’
experience, and checks whether a site appears to be a
phishing site or not. Checking the life time of a registered
website is well-known heuristic as most phishing sites’
Uniform Resource Locator (URL) expires in short time
span. Based on the detection result from each heuristic,
the heuristic-based solution calculates the likelihood of a
site being a phishing site and compares the likelihood
with the defined discrimination threshold.
A current challenge of the heuristics-based solutions is
improving the detection accuracy. Our proposed Hu-
manBoost [1] aims at improving the machine learning-
based detection methods of phishing sites. The key con-
cept of HumanBoost is utilizing Web users’ Past Trust
Decisions (PTDs), which is the record of users’ past de-
cisions. Basically, humans have the potential to identify
phishing sites, even if existing heuristics cannot detect
them. HumanBoost constructs PTD databases for each
Web user, and uses each of the PTD record as a feature
vector for detecting phishing sites. For our pilot study, in
November 2007, we invited 10 participants and perfor-
med a subject experiment. The participants browsed 14
simulated phishing sites and six legitimate sites, and
judged whether or not the site appeared to be a phishing
site. We utilize participants’ trust decisions as a new heu-
ristic and we let Adaptive Boosting (AdaBoost) [2] in-
corporate it into eight existing heuristics. The results
show that the average error rate for HumanBoost was
13.4%, whereas for participants it was 19.0% and for
AdaBoost 20.0%.
This paper analyzes the users’ trust decision patterns
by investigating their decisions making. We assumed that
some participants’ PTDs were not useful, since we have
found such cases that the detection accuracy of Human-
Boost was lower than that of AdaBoost. In July 2010, we
invited 309 participants to perform a phishing Intelli-
gence Quotient (IQ) test, asked them the reason of that
they identified our prepared websites as legitimate or
phishing, and analyzed the criterion when they assessed
the credibility of the sites. Based on the analysis, we ex-
plored useful trust decision patterns, and found that such
Copyright © 2012 SciRes. JILSA
Behind HumanBoost: Analysis of Users’ Trust Decision Patterns for Identifying Fraudulent Websites
people, who were able to utilize their past experience and
assessed the credibility by utilizing both URL of the
website and security information of the Web browser
rather than content of Web page, tended to have useful
PTDs for HumanBoost. To the best of our knowledge,
this is the first study for orchestrating users’ trust deci-
sions and heuristics by machine learning in consideration
of their decisions making patterns.
The rest of this paper is organized as follows. Section
2 summarizes the related work, and Section 3 explains
our proposal and preliminary evaluation. Section 4 de-
scribes our evaluation conditions, and Section 5 shows
our experimental results. Section 6 discusses the avail-
ability of PTDsand the effectiveness of Extended Valida-
tion Secure Socket Layer (SSL) certificates. Finally, sec-
tion 7 summarizes our contributions.
2. Related Work
This section introduces existing detection method and
subject experiment reports as the related work.
2.1. Detection Method for Phishing Sites
There are two distinct approaches for identifying phish-
ing sites. One is URL filtering. It detects phishing sites
by comparing the URL of a site where a user visits with a
URL blacklist, which is composed of the URLs of phi-
shing sites. Unfortunately, the effectiveness of URL fil-
tering is limited. Sheng et al. reported [3] that URL bla-
cklists were ineffective when protecting users initially, as
most of them detected less than 20% of phishing sites at
hour zero. The rapid increase of phishing sites hinders
URL filtering to work sufficiently due to the difficulty of
building a perfect blacklist.
Another approach is a heuristic-based method, which
can detect phishing sites by calculating the likelihood of
being a phishing site. The detection accuracy of existing
heuristic-based solutions was, however, far from suitable
for practical use. To increase the detection accuracy,
Zhang et al. developed CANTINA [4], which employeda
novel heuristic, named “TF-IDF-Final” heuristic. When
the heuristic attempts to identify phishing sites, it feeds
the mixture of the domain name of the current website
and extracted words from content into Google. If the do-
main name matches the domain name of the top 30 sear-
ch results, the website is labeled legitimate.
Aside from developing new heuristics, the combina-
tion methods of heuristics were studied. Our previous
work [5] employed nine machine learning techniques for
detecting phishing sites. By employing eight heuristics
presented by CANTINA, we analyzed 3000 URLs, con-
sisting of 1500 legitimate sites and the same number of
phishing sites, reported on [6] from No-
vember 2007 to February 2008. Finally, we evaluated the
performance of machine learning-based detection meth-
ods in comparison to that of CANTINA. Our evaluation
results showed the best accuracy wasobserved for the
AdaBoost-based detection method. In most cases, ma-
chine learning-based detection methods performed better
2.2. Subject Experiments
Due to the nature of phishing attacks, subject experi-
ments were often used to verify the effectiveness of the
countermeasures, such as phishing prevention systems
and educational materials against phishing [7,8].
To know how people make their trust decision, Dha-
mija et al. showed 22 participants 20 websites and asked
them to determine which ones were fraudulent, and why
[9]. They found that 20% of the participants had not
looked at the address bar and/or the security indicators,
and it led to incorrect choices 40% of the time. Fogg et al.
observed that 2684 participants evaluated the credibility
of two websites and the participants commented about
the received signal from the sites [10]. They found that
the “design look” of the website was mentioned most fre-
quently, being present in 46.1% of the comments.
3. HumanBoost
This section outlines HumanBoost [1], a mechanism to
improve the detection accuracy of phishing sites.
3.1. Overview
The key concept of HumanBoost is utilizing Web users’
Past Trust Decisions (PTDs). Web users are generally re-
quired to make trust decisions whenever they input their
personal information into websites. In other words, we
assumed that a Web user outputs a binary variable, phi-
shing or legitimate, when the website requires users to
input their password. Note that existing heuristics for de-
tecting phishing sites, all of which were explained in [4],
are similar to output binary variables denoting phishing
or not phishing.
In HumanBoost, we assume that each Web user has
his/her own PTD database. The schema of the PTD data-
base consists of the website’s URL, actual conditions, the
result of the user’s trust decision, and the results from
existing heuristics. Note that we do not propose sharing
the PTD database among users due to the privacy con-
cerns. Given the number of existing heuristics N, the
PTD database can be regardedas a training dataset that
consists of N + 1 binary explanatory variables and one
binary response variable. We, therefore, employ a ma-
chine learning technique for studying this binary vector
for each user’s PTD database.
Copyright © 2012 SciRes. JILSA
Behind HumanBoost: Analysis of Users’ Trust Decision Patterns for Identifying Fraudulent Websites 321
3.2. Theoretical Background
In this study we employ the AdaBoost algorithm that
learns a strong algorithm which returns the output H by
combining a set of weak algorithms ht and a set of weight
The weights are learned through supervised training
off-line. Formally, AdaBoost uses a set of input data
,: 1,,
yi m where xi is the input, yi is the classi-
fication and m is the number of samples.
Each weak algorithm is only required to have an error
rate lower than 50%. The AdaBoost algorithm iterates
the calculation of a set of weight
i on the samples.
At t = 1, the samples are equally weighted so
The update rule consists of three stages. First, Ada-
Boost chooses the weight αt as shown in (2).
where i
is the weighted er-
ror rate of classifier ht. Second, AdaBoost updates the
weights by (3).
Di hxy
e if
e if
tti i
Di hx y
hx y
where t
is a normalization factor,
Finally, it outputs the final hypothesis Has shown in (1).
In the context of detecting phishing, αt is the weight
for each heuristic ht, and
is the weight for each
website while training t If t correctly classifies the
sites whose
iis high, εt would be low because it
only increases when
, hence, αt would be
high as shown in (2). The reason for high
i is that
the other heuristics often fail to label the site
rectly. In short, AdaBoost assigns high weight to a clas-
sifier that correctly labels a site that other classifiers had
labeled incorrectly.
We have two reasons of employing AdaBoost. One is
that it had performed better in our previous comparative
study [5], where it demonstrated the lowest error rate, the
highest f1 measure, and the highest AUC of the Ada-
Boost-based detection method, as mentioned in Section 2.
The other is that we expect AdaBoost to cover each user’s
weak points.Assuming that a user’s trust decision can be
treated as a classifier, AdaBoost would cover users’ weak
points by assigning high weights to heuristics that can
correctly judge a site that the user is likely to misjudge.
3.3. Preliminary Evaluation of HumanBoost
As a pilot study, we invited 10 participants, all Japanese
males, from the Nara Institute of Science and Technol-
ogy. Three had completed their master’s degree in engi-
neering within the last five years, and the others were
master’s degree students. In November 2007, the parti-
cipants browsed 14 simulated phishing sites and six le-
gitimate sites, as shown in Appendix A. Since all parti-
cipants lived in Nara Prefecture, we employed the Nanto
Bank, a Japanese regional bank in Nara, for website 7.
We used a within-subjects design, where every parti-
cipant saw every website and judged whether or not it
appeared to be a phishing site. In our test we asked 10
participants to freely browse the websites. Each partici-
pant’s PC was installed with Windows XP and Internet
Explorer (IE) version 6.0 as the browser. Other than con-
figuring IE to display International Domain Name (IDN),
we installed no security software and/or anti-phishing
toolbars. We also did not prohibit participants from ac-
cessing websites not listed in Appendix A. Some partici-
pants therefore inputted several terms into Google and
compared the URL of the site with the URLs of those
listed in Google’s search results.
By utilizing participants’ trust decisions as a new weak
hypothesis, we let AdaBoost incorporate the heuristic
into eight existing heuristics, namely Age of Domain,
Known Images, Suspicious URL, Suspicious Links, IP
Address, Dots in URL, Forms, and TF-IDF-Final heuris-
tics all of which were employed by CANTINA [4]. The
results show that the average error rate for HumanBoost
was 13.4%, whereas that for participants was 19.0% and
for the AdaBoost-based detection method 20.0%.
We then conducted a follow-up study. The study had
in March 2010, new participants, aged 23 to 30. All were
from the Japan Advanced Institute of Science and Tech-
nology. All were Japanese males, two had completed
their master’s degree in engineering within the last five
years, and the others were master’s degree students. Be-
fore conducting the follow-up study, we modified the
dataset described in Appendix A. Due to the renewal of
PayPal’s website during 2007-2010, we updated web-
sites 9 and 20 to mimic the current PayPal login pages.
Nanto Bank, website 7 in Appendix A, had changed both
the URL and the content of its login page. Nanto Bank is
also not well-known in Ishikawa Prefecture, where the
participants of the follow-up study lived. We therefore
changed website 7 to Hokuriku Bank (another Japanese
regional bank in Ishikawa). The domain name of Hokuri-
ku Bank is, the same as Nanto
The invited 11 participants were asked to label 20 web-
sites as legitimate or phishing. Different from the first
study, we prepared printed documents to expedite this
experiment. Instead of operating a browser, participants
looked at 20 screen shots of a browser that had just fin-
ished rendering each website. Additionally, showing a
Copyright © 2012 SciRes. JILSA
Behind HumanBoost: Analysis of Users’ Trust Decision Patterns for Identifying Fraudulent Websites
browser screen shot is often used for phishing IQ tests.
The results show that the average error rate for Human-
Boost was 10.7%, whereas that for participants was
31.4% and for AdaBoost 12.0%.
We found two problems in earlier research. One is that
our experiments invited biased sample. All participants
were male, and almost of them were belonged the gradu-
ate school of information technology, and the rest were
received master of engineering. Another is that some
participants’ PTD were not useful. In the case of such
participants, we observed that the average error rate of
HumanBoost were higher than that of AdaBoost.
4. Experimental Design
This paper attempts to solve problems described in Sec-
tion 3.3. In July 2010, we invited various participants to
thwart bias and investigated the criterion when partici-
pants assessed the credibility of our prepared web-
sites.This section describes how we setupour experiment
and the dataset description of the phishing IQ test.
4.1. Experimental Setup
A new phishing IQ test is performed to clarify the peo-
ple’s criterion on judging websites’ credibility. We let
the participants label these sites as legitimate or phishing,
and asked them to the reason of their decisions making
by using the form of the questionnaire. The questionnaire
items are listed as following.
4.1.1. Past Experience with Website
This is a check of whether a participant has experience of
using the websites in Appendix B. In the earlier subject
experiment [10], people sometimes drew on their past
experiences with a site to reach a conclusion while they
were just assessing the credibility of the sites. Accord-
ingly, our participants were asked that they have used the
site or not.
4.1.2. Perception of Website’s Credibility
This is a check of how a participant labeled a site as le-
gitimate or phishing. Each participant saw options, na-
mely “Content of Web page”, “URL of the site”, “Secu-
rity Information of Browser”, and “Other Reason”. They
also marked all that applied (multiple answers allowed),
and described their detail reason if selecting “Other Rea-
son” option.
The invited participants saw 20 screen shots of a bro-
wser that rendered the websites. These screen shots were
taken on Windows Vista and IE 8.0 because IE 6.0 was
out of date in July 2010.
After participants finished answering these question-
naires for the websites in Appendix B, we showed the
websites in Appendix A. The participants also judged the
sites, and we calculated the detection accuracy of each
participant, the AdaBoost-based detection method, and
In the experiment, participants knew that they would
be looking at a mixture of phishing sites and legitimate
sites. Since we recruited participants via an Internet re-
search company, for a contractual reason we were re-
quired to explain the purpose of our experiments to them.
We also had to inform them that we never abuse their
answer. Hence, the participants realized that they were
not personally suffering from our simulated phishing
attacks. Observing the activities of the participants in dif-
ferent mental states is interesting, but it is beyond the
scope of this paper.
Note that we used IE 8.0 for investigating users’ crite-
rion of credibility, and also used IE 6.0 for evaluating the
detection accuracy of HumanBoost. The main difference
between two conditions was that IE 8.0 was capable of
Extended Validation (EV) SSL certificates whereas IE
6.0 was not. With the changes to the SSL certificate in-
terface in modern browsers, a new identity indicator has
been introduced to provide a level of confidence in a
site’s identity. While the site employs an EV SSL cer-
tificate, the background of the address bar will be colored
green and the information displayed in the area to the
right of the lock icon alternates periodically between the
organization name/country code, and the CA who issued
the certificate. According to the experiments performed
by Robert et al. [11], EV SSL certificates facilitated for
people to identify the ownership of the website. In order
to clarify the difference, we planned that both of phishing
IQ tests included the website of Tokyo-Tomin Bank
which employed an EV SSL certificate. The differences
of browser versions will be discussed in Section 6.3.
4.2. Dataset Description
As shown in Appendix A, this paper prepared 14 phish-
ing sites and six legitimate ones. In comparison to the
typical phishing IQ test [9] that prepared 13 phishing
sites and seven legitimate ones, our existence rate of phi-
shing sites was not so extremely higher.
Our intention of choosing these websites is to show the
participants’ criterion for credibility. For checking if the
participants assess the websites by their past experience,
the dataset consisted of websites that the participants were
likely to use. We also employed Tokyo-Tomin Bank,
which is the one of Japanese regional bank in Tokyo, as
website 7 and 25 while we assumed that the participants
mainly lived around Tokyo area.
We also let the participants consider when they tried to
label correctly. The participants who were likely to check
by “URL of the site” would confuse to label the website
26 and 29, since these sites had almost the same URL as
the legitimate sites except for one letter. The URLs of the
Copyright © 2012 SciRes. JILSA
Behind HumanBoost: Analysis of Users’ Trust Decision Patterns for Identifying Fraudulent Websites 323
websites 22, 23, 36, 37 and 39 contained a legitimate-
sounding domain name.The websites 25 and 28 were le-
gitimate but the domain name of these sites had no indi-
cation of their bland names. For participants who tended
to check by “Security Information of Browsers,” the web-
sites 29 and 35 might be difficult because they were phi-
shing but equipped with valid SSL certificates. Conver-
sely, websites 34 and 40 were legitimate but did not em-
ploy valid SSL certificates though they required users to
login. Of course, our prepared phishing websites were
look alike of the legitimate ones. It might be difficult if
the participant relied on “Content of Web page.”
Of the recruited 309 participants, 42.4% (131) were
male and 57.6% (178) were female. Age ranged from 16
to 77 years old. 48.2% of participants (149) were office
workers, and 19.7% (61) were households and 5.8% (18)
were students. Of the students, 66.7% (12) were Bache-
lors, 11.1% (2) were high school students, 5.6% (1) were
masters’ degree students.
The other conditions of this study are the same as the
follow up study described in Section 3.3. In July 2010,
the 309 participants looked at 40 screen shots and judged
whether the site seems to be phishing or legitimate.
5. Experimental Results
This section described the result of clustering analysis to
explore the types of the participants whose PTDs are use-
ful. After classifying the participants into some groups,
we calculate the detection accuracy of each group for com-
parative study.
5.1. Factors of Ability for Correct Decision
We observed that the error rate was 42.7% if participants
answered that they have experience to use the site. Op-
positely, that was 48.6% if participants did not. Therefore,
we assumed that past experience with the site has posi-
tive effectiveness on identifying the site.
Next, we also observed the relationships among the
response for the perception of the website’s credibility
and the error rate, as shown in Table 1 where a check-
mark denotes that the option was selected. Based on the
result, we assumed that both labeling by URL of the site
and labeling by security information of the browser have
positive effectiveness on identifying the site, otherwise
labeling by content of Web page has negative effective-
We then analyzed the effectiveness of selecting the
“Other Reason” option as to whether our factors should
be affected by it. This option was selected by 39 partici-
pants, with 170 total times being selected; 2.8% of the all
times. According to the participants’ descriptions, the
main reason for choosing this option was “I don’t know”;
20 participants stated it 120 times (2.0%) in total. Of the
120 times, it was three times that the “Other Reason”
option was selected with “Content of Web page” option.
We regarded that the participant checked the sites by
“Content of Web page”. The rest of the 117 times, 19 par-
ticipants selected “Other Reason” option without check-
ing any other options. In our evaluation, these partici-
pants were treated as usual as other participants, but in
Section 6.2, we discuss the availability of PTD regarding
partici- pants who answered “I don’t know.”
In other cases, we found that the participants checked
“Other Reason” instead of checking other options. For
example, some participants stated, “I can see the legiti-
mate company’s logo” or “The page requires an account
number before logging into the website”. These answers
were regarded as selecting “Content of Web page” option.
Such cases were observed for 21 participants in 47 times
(0.8%). There were three other cases, but three partici-
pants stated that they checked by their experience. As we
mentioned that past experience was defined as a factor,
accordingly, we decided to ignore the effectiveness of
choosing “Other Reason”.
Based on these findings, we quantified the users’ fac-
tors of the ability to make correct decision. Note the abil-
ity were derived from past subject experiments [9,10],
but unfortunately, we could not find any consensus for
their quantification. We calculated the factors as follows
by using the available information that we could observe.
5.1.1. Factor 1: Utilization of Past Experience
This is a detection accuracy of the website which a par-
ticipant has an experience to use. For instance, if the par-
ticipant had experience of using 10 out of 20 sites and
correctly answered 8 of the 10 sites, this variable is 0.8.
5.1.2. Factor 2: Ignoring Signals from Content of Web
This is the probability that each participant did not select
“Content of Web page” as his/her criteria for credibility.
Phishing sites are lookalike legitimate websites; therefore,
the content and/or the design of the site give no hints for
Table 1. Perception of website’s credibility and error rates.
Content of
web page
the site
of browser
The average
error rate
v 61.9%
v 25.5%
v 36.8%
v v 51.7%
v v 60.0%
v v 17.9%
v v v 49.8%
Copyright © 2012 SciRes. JILSA
Behind HumanBoost: Analysis of Users’ Trust Decision Patterns for Identifying Fraudulent Websites
judging the site. For instance, if a participant checked
“Content of Web page” on six sites, the participants ig-
nored the signals of 14 sites, so this variable is 0.7.
5.1.3. Factor 3: Decision Making by the URL of the
This is the probability that each participant selected
“URL of the site” as his/her criteria for the credibility.
The key difference between phishing sites and legitimate
sites is the URL, so the detection based on the difference
of the URL would facilitate a correct answer.
5.1.4. Factor 4: Awareness of the SSL Padlock Icon
This is the probability that each participant selected “Se-
curity information of browser” as his/her criteria for the
credibility when the site showed an SSL certificate. As
shown in Appendix B, six out of 20 sites are SSL-ena-
bled. Notice that website 28 and 35 are phishing with va-
lid SSL certificates. In such case, users should check both
the SSL padlock icon and URL of the site.
5.1.5. Factor 5: Ignoring Unsuitable Security
This is the probability of each participant did not select
“Security information of browser” as his/her criteria for
the credibility when the site did not employ any SSL
certificate. Nevertheless some participant yielded that
assessing the site by “Security information of browser”
even if they could not identify the SSL padlock icon. The
number of the site without an SSL certificate is 14; al-
most of them were phishing, but website 34 and 40 are
5.2. Detection Accuracy of Each Cluster
We then categorized participants into some clusters and
explored the characteristics of each cluster. This paper-
employed Expectation-Maximization (EM) algorithm
[12] as a clustering method.
The number of clusters was eight. While there are no
best solutions for the problem of determining the number
of clusters to exact, we explored the suitable number
based on Bayesian Information Criterion (BIC) [13]. The
BIC is the value of the maximized log-likelihood meas-
ured with a penalty for the number of parameters in the
model.Given conditions explained above, the participants
were classified into eight clusters, as shown in Table 2,
where the first column denotes eight clusters, the sec-
ond column denotes the number of samples in each clus-
ter, and the rest columns denote a cluster center for each
The participants in cluster 1 and 4 tended to label a site
correctly if they had used the site. They also tended to
assess by “URL of the site” rather than “Content of Web
page.” The difference between cluster 1 and 4 were awa-
reness of the SSL indicators; the participants in cluster 1
checked the “Security information of browsers” carefully,
whereas the participants in cluster 4 did not. The partici-
pants in cluster 2 tended to rely on “Security information
of browsers”. However, they might be received incorrect
signals from web content as the factor 2 and 5 were low-
We then evaluated the detection accuracy at each clus-
ter by using the 20 sites listed in Appendix A. Based on
the detection results, we calculated the average error rate
for each participant group, the AdaBoost-based detection
method, and HumanBoost. To perform our evaluation in
a less biased way, we employed 4-fold cross validation.
Furthermore, our cross validation was repeated 10 times
in order to average the result. The results are summarized
in Figure 1, where the blue bars denote the average error
rate of each participant, the red bar denotes the average
error rate of the AdaBoost-based detection method, and
the yellow bars denote that of HumanBoost.
The average error rate for AdaBoost was 10.5%, for
participant was 40.5% and for HumanBoost was 9.6%.
The lowest average error rate of the participants group
Table 2. Clustering results of EM algorithm.
Mean of factors
Cluster Participant
1 2 3 4 5
1 48 0.8270.766 0.787 0.5430.787
2 30 0.5760.580 0.479 0.8350.240
3 61 0.5910.405 0.674 0.0470.965
4 17 0.7530.953 0.953 0.0390.992
5 71 0.4110.160 0.168 0.0580.950
6 11 0.5610.000 0.014 0.0001.000
7 34 0.1240.006 0.008 0.0001.000
8 37 0.4410.290 0.177 0.4210.822
Figure 1. The average error rates of each participant, Ada-
Boost-based detection method, and HumanBoost.
Copyright © 2012 SciRes. JILSA
Behind HumanBoost: Analysis of Users’ Trust Decision Patterns for Identifying Fraudulent Websites 325
was 27.8% in cluster 1, followed by cluster 2 (30.8%), 4
(37.4%), 3 (39.7%), 5 (45.6%), 8 (48.0%), 7 (48.4%) and
finally 6 (50.0%). The lowest average error rate of Hu-
manBoost was 8.2% in cluster 1, followed by cluster 2
(8.3%), 3 and 4 (9.3%), 5 (10.4%), 6 and 7 (10.7%), and
finally 8 (10.8%). In the cases of the participants’ cluster
1, 2, 3, 4 and 5, the detection accuracy in HumanBoost
was higher than AdaBoost. On the contrary, HumanBoo-
st could not perform better in the cases of the partici-
pants’ cluster 6, 7, and 8.
From the comparison among each group, we consid-
ered that being all of five factorshigher makes Human-
Boost performs better. However, we observed that Hu-
manBoost could not utilize the PTDs of the participants
in group 6-8, even though their factor 5 is higher. We
assumed that the participants in cluster 6 and 7 were
novices because they only received signals from “Con-
tent of Web page” while assessing the credibility. Due to
the lack of knowledge of security and security indicators,
they did not selected “Security information of brows-
ers.” Even if the cluster 6 has higher value in the factor 1,
the performance worse while the other factors were
lower. The cluster 8 sometimes checked the security in-
formation, but mainly received signals from “Content of
Web page” rather than SSL pad-rock icon.
In comparison to the average error rate of Human-
Boost in each cluster with that of AdaBoost, the clusters
1 and 2 evidently improve the accuracy by utilizing
PTDs. We therefore assumed that a user’s PTD might be
useful when the user tended to assess the credibility of
the URL of the site and/or an SSL indicator of the bro-
wser rather than content of Web page.
6. Discussion
This section discusses another approach for investigating
the availability of PTD. It then discusses the availability
of PTDs of participant who stated “I don’t know”. It fin-
ally explains the effectiveness of differences between the
conditions of our phishing IQ tests.
6.1. Availability of PTD from Theoretical Aspect
Theoretically, the key feature of AdaBoost is that a weak
hypothesis, which performs just slightly better than ran-
dom guessing, can be boosted into a strong hypothesis, as
we mentioned in Section 3.2. Aside from the clustering
approach, it is conceivable that the availability of PTD
can be verified by checking the average error rate of each
user’s PTDs since HumanBoost treats the PTD as one of
the weak hypotheses.
We therefore measured the error rate of PTDs for
every participant by using 20 websites in Appendix B,
classified them into percentile decades, and calculated
the average error rate of HumanBoost for each decade by
using 20 websites in Appendix A.
Table 3 summarizes the result. We found that the de-
tection accuracy of HumanBoost was still beneficial even
if the participant whose PTD has an error ratelower than
50%, but not lower than 60%. The error rates of PTDs in
the case of the average error rates of PTDs (= x) was
20% x < 30% was 7.9%, that was lesser than 8.3% in
the case of the 10% x < 20%.
From these findings, we assumed that participant’s
criterion for the credibility should be checked whenever
verifying the availability of HumanBoost even if the par-
ticipants could detect websites accurately.
6.2. Participant Who Stated “I Don’t Know”
Table 4 summarizes the clustering results of the 19 par-
ticipants who checked the “Other Reason” option and
stated “I don’t know” as the reason. In Table 4, the first
column denotes the eight clusters, the second column
denotes the number of the participants who belong to the
cluster, and the third column denotes the total times of
the site that the participant answered “I don’t know”.
We then measured the average error rate of the partici-
pant sand HumanBoost. The results are summarized in the
fourth column and fifth column in Table 4. The average
error rate of each participant was 45.0%, and that of the
Table 3. The error rate of PTD and HumanBoost.
Error rate of PTD Number of
The average error rate of
x < 10% 4 7.6%
10% x < 20% 16 8.3%
20% x < 30% 40 7.9%
30% x < 40% 36 8.1%
40% x < 50% 37 9.9%
50% x < 60% 63 9.7%
60% x < 70% 56 10.6%
70% x < 80% 54 11.2%
80% x 3 12.5%
Table 4. Error rates of participant who stated “I don’t
The average error rate
ClusterParticipantTotal times PTD HumanBoost
1 3 10 25.0% 9.7%
2 0 0
3 2 7 67.5% 7.3%
4 0 0
5 7 26 52.9% 9.2%
6 0 0
7 1 1 35.0% 10.5%
8 6 73 46.7% 9.2%
Copyright © 2012 SciRes. JILSA
Behind HumanBoost: Analysis of Users’ Trust Decision Patterns for Identifying Fraudulent Websites
HumanBoost was 9.2%. In comparison to the clustering
results in the average error rates for clusters 3, 5, 7 and 8
were lower than with participants who answered “I don’t
know”. However, the average error rate of cluster 1 was
higher than that for Figure 1.
The results showed that PTDs were usually available
even if the participant answered “I don’t know.” How-
ever, we also found that PTDs could not improve the
detection accuracy when the participants belonged to
cluster 1 who answered “I don’t know”. We assumed that
there were two types of participants who stated “I don’t
know”. One is those who are novices, who have no crite-
ria for identifying the websites. The other is that the par-
ticipants have their own criterion, but could not identify
the site to a lack of knowledge on the websites. In future
work, the background of “I don’t know” answers.
6.3. Effectiveness of Extended Validation SSL
Aswe explained in Section 4.1, this paper used IE 8.0 for
investigating users’ criterion of credibility, and also used
IE 6.0 for evaluating the detection accuracy of Human-
Boost. In order to clarify the difference, we focused on
analyzing the detection accuracy of website 7 in Appen-
dix A and website 25 in Appendix B, these were the le-
gitimate sites of Tokyo-Tomin Bank. Aside from the in-
terfaces for the EV SSL, the conditions were the same in
the two sites.
The average error rates for each cluster are shown in
Table 5. Due to the difficulty of identifying Tokyo-To-
min Bank, the average error rates were tended to be high-
er; the URL of Tokyo-Tomin Bank started https://www2. and the owner of the website was dis-
played as “NTT DATA CORPORATION”, all of which
were not associated with Tokyo-Tomin Bank. Albeit
such information confused the participants, the average
error rate was decreased in the almost all cases of using
IE 8.0. Especially, we observed that the detection accu-
racy of cluster 1, 3, 4 and 8 were dramatically improved.
We assumed that the participants regarded the site as
phishing by checking URL at first, but reconsidered when
they saw the EV SSL certificate. Experiments in the
same conditions between the investigation of users’ cri-
terion and the evaluation of HumanBoost were our future
7. Conclusions
This study illustrates the trust decision patterns suitable
for HumanBoost. For our analysis, we conducted a phi-
shing IQ test with 309 participants in July 2010. First, we
investigated the evaluation criteria of websites’ credibil-
ity from the standpoint of participants. They browsed 14
simulated phishing sites and six legitimate sites, judged
whether or not the site appeared to be a phishing site, and
Table 5. The average error rate of EV SSL capable browser
and non EV SSL capable browse r .
Cluster IE 6.0 IE 8.0
1 72.9% 47.9%
2 51.4% 54.1%
3 54.5% 27.3%
4 56.7% 30.0%
5 68.9% 50.8%
6 82.4% 70.6%
7 49.3% 43.7%
8 41.2% 29.4%
answered the reasons of their decision making in the
form of questionnaire.
The questionnaire items were “Past Experience with
the site,” and “Perception of Website’s Credibility,” the
latter required the participants marked all that applied;
the options were “Content of Web page”, “URL of the
site”, “Security information of browser” and “Other
Reason”. Based on their response, we defined five fac-
tors of ability for decision making, namely “Utilization
of past experience”, “Ignoring signals from content of
Web page”, “Decision making by the URL of the site”,
“Awareness of the SSL padlock icon” and “Ignoring un-
suitable security information.”
In order to explore the suitable decision patters for Hu-
manBoost, this study compared the detection accuracy.
According to our questionnaire, the participants were
classified into eight groups by EM clustering algorithm.
The participants also saw another 14 simulated phishing
sites and six legitimate sites, judged whether or not the
site appeared to be a phishing site. We analyzed that the
Past Trust Decisions (PTDs) of the participants who be-
longed to the particular clusters improved the detection
The key finding of our experiments is that the partici-
pantswith useful PTDs tend to evaluate sites’ URL and/
or browser’s SSL indicator rather than contents of Web
pages to judge the credibility of the sites. This habit leads
to make trust decisions correctly. The habit’s importance
will increase when ordinary Web users start to employ
machine learning technique for detecting malicious sites.
By integrating the trustful PTDs with state-of-the-art ma-
chine learning techniques, we believe the number of
phishing incident will be suppressed.
[1] D. Miyamoto, H. Hazeyama and Y. Kadobayashi, “Hu-
manBoost: Utilization of Users’ Past Trust Decision for
Identifying Fraudulent Websites,” Journal of Intelligent
Learning Systems and Applications, Vol. 2, No. 4, 2010,
Copyright © 2012 SciRes. JILSA
Behind HumanBoost: Analysis of Users’ Trust Decision Patterns for Identifying Fraudulent Websites
Copyright © 2012 SciRes. JILSA
pp.190-199. doi:10.4236/jilsa.2010.24022
[2] Y. Freund and R. E. Schapire, “A Decision-Theoretic Ge-
neralization of On-Line Learning and an Application to
Boosting,” Journal of Computer and System Science, Vol.
55, No. 1, 1997, pp. 119-139.
[3] S. Sheng, B. Wardman, G. Warner, L. F. Cranor, J. Hong
and C. Zhang, “An Empirical Analysis of Phishing Black-
lists,” 2009.
[4] Y. Zhang, J. Hong and L. Cranor, “CANTINA: A Con-
tent-Based Approach to Detect Phishing Web Sites,” Pro-
ceedings of the 16th World Wide Web Conference, Banff,
8-12 May 2007, pp. 649-656.
[5] D. Miyamoto, H. Hazeyama and Y. Kadobayashi, “An
Evaluation of Machine Learning-based Methods for De-
tection of Phishing Sites,” Australian Journal of Intelli-
gent Information Processing Systems, Vol. 10, No. 2,
2008, pp. 54-63.
[6] OpenDNS, “PhishTank—Join the Fight against Phish-
[7] M. Wu, R. C. Miller and S. L. Garnkel, “Do Security-
Toolbars Actually Prevent Phishing Attacks?” Proceed-
ings of Conference on Human Factors in Computing Sys-
tems, New York, 22-27 April 2006.
[8] P. Kumaraguru, Y. Rhee, A. Acquisti, L. F. Cranor, J. I.
Hong and E. Nunge, “Protecting People from Phishing:
The Design and Evaluation of an Embedded Training
Email System,” Proceedings of Conference on Human
Factors in Computing Systems, San Jose, 27 April-3 May
2007, pp. 905-914.
[9] R. Dhamija, J. D. Tygar and M. A. Hearst, “Why Phish-
ing Works,” Proceedings of Conference on Human Fac-
tors in Computing Systems, New York, 22-27 April 2006,
pp. 581-590.
[10] B. J. Fogg, L. Marable, J. Stanford and E. R. Tauber,
“How Do People Evaluate a Web Site’s Credibility? Re-
sults from a Large Study,” Technical Report, Stanford,
[11] R. Biddle, P. C. van Oorschot, A. S. Patrick, J. Sobey and
T. Whalen, “Browser Interfaces and Extended Validation
ssl Certificates: An Empirical Study,” Proceedings of the
2009 ACM Workshop on Cloud Computing Security, New
York, 9-13 November 2009, pp. 19-30.
[12] A. P. Dempster, N. Laird and D. Rubin, “Maximum Like-
lihood from Incomplete Data via the EM Algorithm,”
Journal of the Royal Statistical Society Series, Vol. 39,
No. 1, 1977, pp. 1-38.
[13] G. E. Schwarz, “Estimating the Dimension of a Model,”
Annals of Statistics, Vol. 6, No. 2, 1978, pp. 461-464.
Behind HumanBoost: Analysis of Users’ Trust Decision Patterns for Identifying Fraudulent Websites
Appendix A. Conditions of each site used for evaluating the detection accuracy.
# Website Legitimate/Phishing Lang Description
1 Legitimate EN URL (
2 Tokyo-Mitsubishi UFJ Phishing JP URL (
3 PayPal Phishing EN URL ( ··· %6f%6d)(URL Encoding Abuse)
4 Goldman Sachs Legitimate EN URL (, SSL
5 Natwest Bank Phishing EN URL (,
(Derived from
6 Bank of the West Phishing EN URL (, similar to the legitimate URL
7 Japanese Regional Bank Legitimate JP
(Nanto Bank for the first experiment in November 2007)
(Hokuriku Bank for the second experiment in March 2010)
(In this study, we employed Tokyo-Tomin Bank)
8 Bank of America Phishing EN URL (
(URL Scheme Abuse)
9 PayPal Phishing EN
URL (, first “a’’ letter is a Cyrillic small letter
а’’(U+430) (IDN Abuse)
10 Citibank Phishing EN URL (IP address)
11 Amazon Phishing EN
URL (, contains “amazon’’ in its path (Derived from
12 Xanga Legitimate EN URL (
13 Morgan Stanley Legitimate EN URL (, SSL
14 Yahoo Phishing EN URL (IP address)
15 U.S.D. of the Treasury Phishing EN URL ( (Derived from
16 Sumitomo Mitsui Card Phishing JP URL (
17 eBay Phishing EN URL (
18 Citibank Phishing EN
URL(シテイバンク.com), is pronounced “Shi Tee Ban Ku”, look-alike
“Citibank” in Japanese Letter)(IDN Abuse)
19 Apple Legitimate EN
URL (, SSL, popup warning by accessing non-SSL
20 PayPal Phishing EN
URL (, (URL Scheme
Copyright © 2012 SciRes. JILSA
Behind HumanBoost: Analysis of Users’ Trust Decision Patterns for Identifying Fraudulent Websites 329
Appendix B. Conditions of each site used for investigating users’ criterion while assessing the credibility of the websites.
# Website Phishing/LegitimateLangDescription
21 Japan Net Bank Legitimate EN URL(, EV SSL
22 Mizuho MYRAGE Club Phishing JP URL(
23 mixi Phishing EN URL(
24 Yahoo! Japan Phishing EN URL( (Derived from existed phishing domain)
25 Japanese Regional Bank Legitimate EN
3rd party URL(, EV SSL (In this study, we employed
Tokyo-Tomin Bank)
26 GungHo Games Phishing EN URL(
27 Google Mail Legitimate JP URL(, SSL
28 Mitsubishi-Tokyo UFJ Bank Legitimate EN URL(entry11.bk,, EV SSL
29 Sumitomo Mitsui Card Phishing EN URL(,SSL
30 Twitter Phishing EN URL(
31 Japan Railroad East Phishing EN URL(,
(Derived from existed malicious hosts)
32 Amazon Phishing EN URL(,
(Derived from existed malicious hosts)
33 ANA MYRAGE Club Phishing EN URL(IP address)
34 Ameba Legitimate EN URL(
35 Japan Post Holding Phishing EN URL(, SSL
36 RAKUTEN Phishing JP URL(
37 SQUARE ENIX Phishing EN URL( (Derived from existed phishing domain)
38 Goo Mail Phishing EN URL(IP address)
40 GREE Legitimate EN URL(
Copyright © 2012 SciRes. JILSA