Journal of Intelligent Learning Systems and Applications, 2010, 2, 190-199
doi:10.4236/jilsa.2010.24022 Published Online November 2010 (http://www.scirp.org/journal/jilsa)
Copyright © 2010 SciRes. JILSA
HumanBoost: Utilization of Users’ Past Trust
Decision for Identifying Fraudulent Websites
Daisuke Miyamoto1, Hiroaki Hazeyama2, Youki Kadobayashi2
1Information Security Research Center, National Institute of Information and Communications Technology, Koganei, Tokyo, Japan;
2Graduate School of Information Science, Nara Advanced Institute of Science and Technology, Ikoma, Nara, Japan.
E-mail: daisu-mi@nict.go.jp, hiroa-ha@is.naist.jp, youki-k@is.aist-nara.ac.jp
Received March 25th, 2010; revised August 25th, 2010; accepted September 16th, 2010.
ABSTRACT
This paper presents HumanBoost, an approach that aims at improving the accuracy of detecting so-called phishing
sites by utilizing users’ past trust decisions (PTDs). Web users are generally required to make trust decisions whenever
their personal information is requested by a website. We assume that a database of user PTDs would be transformed
into a binary vector, representing phishing or not-phishing, and the binary vector can be used for detecting phishing
sites, similar to the existing heuristics. For our pilot study, in November 2007, we invited 10 participants and performed
a subject experiment. The participants browsed 14 simulated phishing sites and six legitimate sites, and judged whether
or not the site appeared to be a phishing site. We utilize participants’ trust decisions as a new heuristic and we let
AdaBoost incorporate it into eight existing heuristics. The results show that the average error rate for HumanBoost was
13.4%, whereas for participants it was 19.0% and for AdaBoost 20.0%. We also conducted two follow-up studies in
March 2010 and July 2010, observed that the average error rate for HumanBoost was below the others. We therefore
conclude that PTDs are available as new heuristics, and HumanBoost has the potential to improve detection accuracy
for Web user.
Keywords: Phishing, Personalization, AdaBoost, Trust Decision
1. Introduction
Phishing is a form of identity theft in which the targets
are users rather than computer systems. A phishing at-
tacker attracts victims to a spoofed website, a so-called
phishing site, and attempts to persuade them to provide
their personal information. Damage suffered from
phishing is increasing. In 2005, the Gartner Survey re-
ported that 1.2 million consumers lost $929 million as a
result of phishing attacks [1]. The modern survey con-
ducted in 2008 also reported that more than 5 million
consumers lost $1.76 billion [2]. The number of phishing
sites is also increasing. According to trend reports pub-
lished by the Anti-Phishing Working Group [3], the
number of the reported phishing sites was 25,630 in
March 2008, far surpassing the 14,315 in July 2005.
To deal with phishing attacks, a heuristics-based de-
tection method has begun to garner attention. A heuristic
is an algorithm to identify phishing sites based on users’
experience, and checks whether a site appears to be a
phishing site. Checking the life time duration of the is-
sued website is well-known heuristic as most phishing
sites’ URL expired in short time span. Based on the de-
tection result from each heuristic, the heuristic-based
solution calculates the likelihood of a site being a phish-
ing site and compares the likelihood with the defined
discrimination threshold. Unfortunately, the detection
accuracy of existing heuristic-based solutions is nowhere
near suitable for practical use [4] even though there ex-
ists various heuristics discovered by former studies. In
our previous work [5], we attempted to improve this ac-
curacy by employing machine learning techniques for
combining heuristics, since we assumed that the inaccu-
racy is caused by heuristics-based solutions that cannot
use the heuristics appropriately. In most cases, machine
learning-based detection methods (MLBDMs) performed
better than existing detection methods. Especially, an
AdaBoost-based detection method showed the highest
detection accuracy.
In this paper, we propose HumanBoost, which aims at
improving AdaBoost-based detection methods. The key
concept of HumanBoost is utilizing Web users’ past trust
decisions (PTDs). Basically, humans have the potential
to identify phishing sites, even if existing heuristics can-
HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites
Copyright © 2010 SciRes. JILSA
191
not detect them. If we can construct a database of PTDs
for each Web user, we can use the record of the user’s
trust decisions as a feature vector for detecting phishing
sites. HumanBoost also involves the idea of adjusting the
detection for each Web user. If a user is a security expert,
the most predominant factor on detecting phishing sites
would be his/her trust decisions. Conversely, the existing
heuristic will have a strong effect on detection when the
user is a novice and his/her PTD has often failed.
In our study in November 2007, we invited 10 partici-
pants and performed a subject experiment. The partici-
pants browsed 14 simulated phishing sites and six le-
gitimate sites, and judged whether or not the site ap-
peared to be a phishing site. By utilizing participants’
trust decisions as a new weak-hypothesis, we let
AdaBoost incorporate the heuristic into eight existing
heuristics. The results show that the average error rate
for HumanBoost was 13.4%, whereas that for partici-
pants was 19.0% and for the AdaBoost-based detection
method 20.0%. We then conducted a follow-up study in
March 2010. This study had 11 participants with the al-
most same conditions as the first. The results show that
the average error rate for HumanBoost was 10.7%,
whereas that for participants was 31.4% and for
AdaBoost 12.0%. We also invited 309 participants and
performed another follow-up study in July 2010. The
results show that the average error rate for HumanBoost
was 9.7%, whereas that for participants was 40.5% and
for AdaBoost 10.5%.
The rest of this paper is organized as follows. Section
2 summarizes the related work, and Section 3 explains
our proposal. Section 4 describes our preliminary evalua-
tion, and Section 5 presents a follow-up study. Section 6
discusses the availability of PTDs, the way for removing
bias, and issues on implementing HumanBoost-capable
system. Finally, Section 7 concludes our contribution.
2. Related Work
For mitigating phishing attacks, machine learning, which
facilitates the development of algorithms or techniques
by enabling computer systems to learn, has begun to
garner attention. PFILTER, which was proposed by Fette
et al. [6], employed SVM to distinguish phishing emails
from other emails. Abu-Nimeh et al. compared the pre-
dictive accuracy of six machine learning methods [7].
They analyzed 1,117 phishing emails and 1,718 legiti-
mate emails with 43 features for distinguishing phishing
emails. Their research showed that the lowest error rate
was 7.72% for Random Forests. Ram Basnet et al. per-
formed an evaluation of six different machine learn-
ing-based detection methods [8]. They analyzed 973
phishing emails and 3,027 legitimate emails with 12 fea-
tures, and showed that the lowest error rate was 2.01%.
The experimental conditions were differed between them,
but machine learning provided high accuracy for the de-
tection of phishing emails.
Apart from phishing emails, machine learning was
also used to detect phishing sites. Pan et al. presented an
SVM-based page classifier for detecting those websites
[9]. They analyzed 279 phishing sites and 100 legitimate
sites with eight features, and the results showed the av-
erage error rate to be 16%. Our previous work employed
nine machine learning techniques [5], AdaBoost, Bag-
ging, Support Vector Machines, Classification and Re-
gression Trees, Logistic Regression, Random Forests,
Neural Networks, Naïve Bayes, and Bayesian Additive
Regression Trees. We also employed eight heuristics
presented in [10] and analyzed 3,000 URLs, consisting
of 1,500 legitimate sites and the same number of phish-
ing sites, reported on PhishTank.com [11] from Novem-
ber 2007 to February 2008. Our evaluation results
showed the highest f1 measure at 0.8581, lowest error
rate at 14.15% and highest AUC at 0.9342; all of which
were observed for the AdaBoost-based detection method.
In most cases, MLBDMs performed better than the ex-
isting detection method.
Albeit earlier researches used machine learning, we
find that there are two problems. One is the number of
features for detecting phishing sites is less than that for
detecting phishing emails. It indicates that the develop-
ment of new heuristic for phishing sites is more difficult
than that for phishing emails. The other is to lack the
idea of protecting individual Web user. We considered
that the protection methods should differ in each Web
URL Actual
Condition
The user’s
trust decision
Heuristics
#1 Heuristics
#N
Site 1 Phishing Phishing Phishing Legitimate
Site 2 Phishing Legitimate Phishing Phishing
Site 3 Phishing Phishing Legitimate Phishing
Site M Legitimate Legitimate Legitimate Phishing
Figure 1. Example of PTD and its scheme.
HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites
Copyright © 2010 SciRes. JILSA
192
user as long as phishing attacks target individual users.
Our proposed HumanBoost aims at using past trust deci-
sions as a new heuristic. It also enables the detection
algorithm to customize for each Web user by machine
learning processes describe in Section 3.2.
3. HumanBoost
3.1. Overview
The key concept of HumanBoost is utilizing Web users’
past trust decisions (PTDs). Web users are generally re-
quired to make trust decisions whenever they input their
personal information into websites. In other words, we
assumed that a Web user outputs a binary variable,
phishing or legitimate, when the website requires users to
input their password. Note that existing heuristics for
detecting phishing sites, which we explain in Section 4.2,
are similar to output binary variables denoting phishing
or not-phishing.
In HumanBoost, we assume that each Web user has
his/her own PTD database, as shown in Figure 1. The
schema of the PTD database consists of the website’s
URL, actual conditions, the result of the user’s trust de-
cision, and the results from existing heuristics. Note that
we do not propose sharing the PTD database among us-
ers due to the privacy concerns. The PTD database can
be regarded as a training dataset that consists of N + 1
binary explanatory variables and one binary response
variable. We, therefore, employ a machine learning tech-
nique for studying this binary vector for each user’s PTD
database.
3.2. Theoretical Background
In this study we employ the Adaptive Boosting
(AdaBoost) [12] algorithm that learns a strong algorithm
by combining a set of weak algorithms ht and a set of
weight αt :
ttada hH
 (1)
The weights are learned through supervised training
off-line. Formally, AdaBoost uses a set of input data { xi,
yi : i = 1, … , m} where xi is the input and yi is the classi-
fication.
Each weak algorithm is only required to make the
correct detections in slightly over half the time. The
AdaBoost algorithm iterates the calculation of a set of
weight Dt (i) on the samples. At t = 1, the samples are
equally weighted so Dt (i) = 1/m.
The update rule consists of three stages. First,
AdaBoost chooses the weight as shown in (2).
t
t
t
1
ln
2
1 (2)
where ε t = Pri~Di [ht (xi) yi]. Second, AdaBoost updates
the weights by (3).

iit
iit
t
t
tyxhe
yxhe
Z
iD
D
t
t
)( if
)( if
)(
1
(3)
where Z t is a normalization, factor, Dt+1 (i) = 1. Finally,
it outputs the final hypothesis as shown in (1).
We have two reasons of employing AdaBoost. One is
that it had performed better in our previous comparative
study [5], where it demonstrated the lowest error rate, the
highest f1 measure, and the highest AUC of the
AdaBoost-based detection method, as mentioned in Sec-
tion 2.
The other is that we expect AdaBoost to cover each
user’s weak points. Theoretically, the boosting algorithms
assign high weight to a classifier that correctly labels a site
that other classifiers had labeled incorrectly, as shown in
(3). Assuming that a user’s trust decision can be treated as
a classifier, AdaBoost would cover users’ weak points by
assigning high weights to heuristics that can correctly
judge a site that the user is likely to misjudge.
4. Experiment and Results
To check the availability of PTDs, we invited partici-
pants and performed a phishing IQ test to construct PTDs,
in November 2007. This section describes the dataset
description of the phishing IQ test, introduces the heuris-
tics that we used, and then explains our experimental
design and finally show the results.
4.1. Dataset Description
Similar to the typical phishing IQ tests performed by
Dhamija et al. [13], we prepared 14 simulated phishing
sites and six legitimate ones, all of which contained Web
forms in which users could input their personal informa-
tion such as user ID and password. The conditions of the
sites are shown in Table 1.
Website 1, 4, 7, 12, 13, 19 were actual company web-
sites, but these sites contained defective features that
could mislead participants into labeling them as phishing.
Websites 1 and 12 required users to input their password,
though they employed no SSL certification. Website 4
was Goldman Sachs with the domain name webid
2.gs.com. Since “gs” can imply multiple meanings, the
domain name can confuse participants. Similarly, web-
site 13 contained “clientserv” in its domain name. Web-
site 7 was Nanto Bank, a Japanese regional bank mainly
operating in Nara Prefecture, where almost all the par-
ticipants lived, but its domain name www2.paweb. an-
ser.or.jp which gives no indication of the bank’s name.
Website 19 was Apple Computer Inc. and employed a
valid SSL, but web browsers displayed an alert window
because of its accessing non-SSL content.
HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites
Copyright © 2010 SciRes. JILSA
193
The rests were phishing sites. Websites 5, 11, and 15
were derived from actual phishing sites based on a report
from Phishtank.com. Other phishing sites were simulated
phishing sites that mimic actual websites by using the
phishing practices described in the followed sections.
4.1.1. Confusing URL
Websites 2, 6, 16, and 17 were made to look like well-
known sites, but with slightly different or confusing
URLs. A phishing attacker (phisher) registering a similar
or otherwise legitimate-sounding domain name such as
www-bk-mufg.jp is increasingly common. Website 6 was
hosted at www.bankofthevvest.com, with two “v”s in-
stead of a “w” in its domain name. According to the
phishing IQ test conducted by Dhamija [13], the phishing
site that fooled the most participants was an exact replica
of the Bank of the West homepage and hosted at this
domain name.
4.1.2. IP Address Abuse
Websites 10 and 14 employed IP address abuse; instead
of showing the domain name, the IP address appears in
the browsers’ address bar. For website 10, a phisher
copied the contents of the actual Citibank homepage into
a website and created URLs using IP addresses. The IP
address does not point Citibank, but some participants
would not be aware of this and think the site is legiti-
mate.
4.1.3. IDN Abuse
Websites 9 and 18 employed International Domain Name
(IDN) abuse, modern phishing technique. Fu et al. indi-
cated [14] that the letter “а” in the Cyrillic alphabet is
instance, the URL of website 9 is www.xn--pypal-4ve.
com, which is clearly different from www.paypal.com.
Yet, the domain name can be shown as a www.pаypal.
com in web browsers.
Table 1. Conditions of each website.
# Website Real /
Spoof Lang Description
1 Live.com real EN URL (login.live.com)
2 Tokyo-Mitsubishi UFJ spoof JP URL (www-bk-mufg.jp),
similar to the legitimate URL (www.bk.mufg.jp)
3 PayPal spoof EN URL (www.paypal.com.%73%69 ... %6f%6d)
(URL Encoding Abuse)
4 Goldman Sachs real EN URL (webid2.gs.com), SSL
5 Natwest Bank spoof EN URL (onlinesession-0815.natwest.com.esb6eyond.gz.cn),
derived from PhishTank.com
6 Bank of the West spoof EN URL (www.bankofthevvest.com), similar to the legitimate URL
(www.bankofthewest.com)
7 Nanto Bank real JP URL (www2.paweb.anser.or.jp), SSL, third party URL
8 Bank of America spoof EN URL (bankofamerica.com@index.jsp-login-page.com)
(URL Scheme Abuse)
9 PayPal spoof EN URL (www.paypal.com), first “a” letter is a Cyrillic small letter
а” (U+430) (IDN Abuse)
10 Citibank spoof EN URL (IP address) (IP Address Abuse)
11 Amazon spoof EN URL (www.importen.se), contains “amazon” in its path, derived
from PhishTank.com
12 Xanga real EN URL (www.xanga.com)
13 Morgan Stanley real EN URL (www.morganstanleyclientserv.com), SSL
14 Yahoo spoof EN URL (IP address) (IP Address Abuse)
15 U.S.D. of the Treasury spoof EN URL (www.tarekfayed.com) , derived from PhishTank.com
16 Sumitomo Mitsui Card spoof JP URL (www.smcb-card.com), similar to the legitimate URL
(www.smbc-card.com)
17 eBay spoof EN URL (secuirty.ebayonlineregist.com)
18 Citibank spoof EN
URL (
シテイバンク
.com), is pronounced “Shi Tee Ban Ku”,
look-alike “Citibank” in Japanese Letter)
(IDN Abuse)
19 Apple real EN URL (connect.apple.com), SSL, popup warning by accessing
non-SSL content
20 PayPal spoof EN URL (www.paypal.com@verisign-registered.com),
(URL Scheme Abuse)
HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites
Copyright © 2010 SciRes. JILSA
194
4.1.4. URL Scheme Abuse
The URLs of websites 8 and 20 contained an at-mark
(@) symbol. When the symbol is used in a URL, all text
before it is ignored and the browser references only the
information following it as a hostname. For website 8,
the URL is
http://bankofamerica.com@index.jsp-login-page.com.
Even if it seemed like bankofamerica.com, web browsers
would ignore this and would be directed to
index.jsp-login-page.com.
4.1.5. URL Encoding Abuse
URL encoding is an accepted method of representing
characters within a URL that may need special syntax
handling to be correctly interpreted. This is achieved by
encoding the character to be interpreted with a sequence
of three characters. This triplet sequence consists of the
percent character “%” followed by the two hexadecimal
digits representing the octet code of the original charac-
ter. For instance, the US-ASCII character set represents a
letter “s” with hexadecimal code 73, so its URL-encoded
representation is %73. Website 3 glossed over its domain
name by URL encoding abuse to make the domain name
mimic that of PayPal, Inc.
4.2. Heuristics
Our experiment employs eight types of heuristics, all of
which were employed by CANTINA [15]. To the best of
our knowledge, CANTINA is the most successful tool
for combining heuristics, since it has achieved high ac-
curacy in detecting phishing sites without using the URL
blacklist.
4.2.1. Age of Domain (H1)
This is a check of whether the domain was registered
more than 12 months ago. If it was, the heuristic deems it
a legitimate site. Otherwise it deems it a phishing site. A
shortcoming of this heuristic was that newly created le-
gitimate sites are not registered in one year. In this case,
the heuristic will fail. Another shortcoming is that do-
main names of many phishing sites were in fact regis-
tered over a year ago. Especially, modern phishing sites
are often discovered on a host owned by legitimate
company. Some vulnerability in that host allowed a
phisher to penetrate it and set up a phishing sites. In such
cases, the domain name was often registered long time
ago, and thus, the heuristic fails to classify it correctly.
4.2.2. Known Images (H2)
This is a check of whether a page contains inconsistent
use of well-known logos such as those of eBay, PayPal,
Citibank, Bank of America, Fifth Third Bank, Barclays
Bank, ANZ Bank, Chase Bank, and Wells Fargo Bank.
For instance, if a site contains eBay logos but is not on
an eBay domain, the heuristic deems it a phishing site.
However, the function of pattern-matching in a digitized
image might lead to many misjudgments. In the other
case, this heuristic also fails when legitimate sites em-
ploy these logo files. Even if a company has a business
relationship with PayPal and uses the PayPal logo in its
website, the heuristics labels this as a phishing site.
4.2.3. Suspicious URL (H3)
This is a check of whether the site URL contains an
at-mark (@) symbol or a hyphen (-) in the domain name.
If so, the heuristic deems it a phishing site because
phishing attackers are likely to use these symbols in their
domain name of a phishing site. The weakness of the
heuristics are that some legitimate sites, (e.g., aist-
nara.ac.jp), use a hyphen in their domain name. Several
phishing sites also do not have an at-mark or a hyphen in
their domain name.
4.2.4. Suspicious Links (H4)
Similar to the Suspicious URL heuristic, this one checks
if a link on the page contains an at-mark or a hyphen.
The weak points of this heuristic are same as those of the
Suspicious URL heuristic.
4.2.5. IP Address (H5)
This is a check of whether the domain name of the site is
an IP address. Though legitimate sites rarely link to
pages via an IP address, phishers often attract victims to
phishing sites by IP address links. The heuristic fails if
the URL of a phishing site uses a fully qualified domain
name, or that of a legitimate site is an IP address.
4.2.6. Dots in URL (H6)
This is a check of whether the URL of the site contains
five or more dots. According to Fette et al. [6], dots can
be abused for attackers to construct legitimate-looking
URLs. One technique is to have a sub-domain. Another
is to use a redirection script, which to the user may, for
instance, appear like a site hosted at google.com, but in
reality will redirect the browser to phishing.com. In both
of these examples, either by the inclusion of a URL into
an open redirect script or by the use of a number of
sub-domains, there are a large number of dots in the
URL. The heuristic fails if there are fewer than five dots
in the URL of a phishing site. For instance, a phishing
site, which was reported November 2008 and placed at
http://kitevolution.com/os/chat6/plugins/safehtml/www.p
aypal.com/canada/cgi-bin/webscr.php?cmd=_login-run,
includes only four dots. Conversely, the URLs of some
legitimate sites can have five or more dots.
4.2.7. Forms (H7)
This is a check of whether the page contains web input
HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites
Copyright © 2010 SciRes. JILSA
195
forms. It scans the HTML for <input> tags that accept
text and are accompanied by labels such as “credit card”
and “password”. If this is the case, the heuristic deems it
a phishing site. Unfortunately, this heuristic fails in la-
beling whenever phishers uses digital images of such
words rather than using actual text.
4.2.8. TF-IDF-Final (H8)
This is a check of whether the site is phishing by em-
ploying TF-IDF-Final, an extension of the Robust Hy-
perlinks algorithm [14]. When the heuristic attempts to
identify phishing sites, it feeds the mixture of word lexi-
cal signatures and the domain name of the current web
site into Google. If the domain name matches the domain
name of the top 30 search results, the web site is labeled
legitimate. Some phishing sites, however, can be made to
rank more highly in search results by manipulation of the
search result page.
4.3. Experimental Design
We used a within-subjects design, where every partici-
pant saw every website and judged whether or not it ap-
peared to be a phishing site. In our test we asked 10 par-
ticipants to freely browse the websites. Each partici-
pant’s PC was in-stalled with Windows XP and Internet
Explorer (IE) version 6.0 as the browser. Other than con-
figuring IE to display IDN, we installed no security
software and/or anti-phishing toolbars. We also did not
prohibit participants from accessing websites not listed in
Table 1. Some participants therefore inputted several
terms into Google and compared the URL of the site
with the URLs of those listed in Google’s search results.
In this experiment, we used the average error rate as a
performance metric. To average the outcome of each test,
we performed 4-fold cross validation and repeated in 10
times. However, we considered that the experiment in-
volved a small, homogeneous test population; therefore it
would be difficult to generalize the results toward typical
phishing victims. We will discuss our plan for removing
the bias in Section 6.
4.4. Experiment Results
First, we invited 10 participants, all Japanese males, from
the Nara Institute of Science and Technology. Three had
completed their master’s degree in engineering within
the last five years, and the others were master’s degree
students. We let participants to label the websites de-
scribed in Table 1. The results by each participant are
shown in Table 2. A hash mark (#) denotes the number
of websites in Table 1, P1 - P10 denote the 10 participants,
“F” denotes that a participant failed to judge the website,
and the empty block denotes that a participant succeeded
in judging it correctly.
Next, we determined the detection accuracy of the
AdaBoost-based detection method. We used eight heu-
ristics and outputted a binary variable representing
phishing or not-phishing. The detection results by each
heuristic are shown in Table 2, where H1 - H8 denote
eight heuristics in which numbers are correspond to Sec-
tion 4.2.
Finally, we measured the detection accuracy of Hu-
manBoost. We constructed 10 PTD databases. In other
Table 2. Detection results by each participant and heuristic, in November 2007.
Participants Heuristics
# P1 P
2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P10 H1H2 H 3 H 4 H 5 H 6 H 7 H 8
1 - - - - - - FF- - - - - - - F - -
2 F F - - - - - - - - - - - FF F F -
3 - F - - - - - - - - FF- FF F - -
4 - - F - - F - - F- - F- - - - - -
5 - - - - - - FF- - - - F - F - F -
6 - F F - F F - - - - - FF FF F F -
7 F F - F - F - - - - - F- - - - F F
8 - - - - - - - - - - - - - FF F F -
9 - - - - - - - - - - FF- FF F - -
10 - - - F - - - - - - - FF- - F - -
11 - - - - - - F- - - FFF F F F - -
12 - - - - - - F- F- - F- - - - F F
13 - F F - - - - - - - - - - - - - F -
14 - - - - - - F- - F- F FF F F -
15 - - - F - - - - - - - F- - F F - -
16 - F - F - - - F - - FFF F F F - -
17 - F - F - - - F - - - FFFF - F -
18 - - - - - - - - - - FF - - F -
19 F - - F - - FFF - - F- - - - F -
20 - - F - - - - - - -
- - - FF F F -
HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites
Copyright © 2010 SciRes. JILSA
196
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
Average
AdaBoost
0%10%20% 30%40%50% 60%70%80%90%100%
15.0%
35.0%
20.0%
30.0%
5.0%
15.0%
25.0%
30.0%
15.0%
0.0%
19.0%
19.4%
21.7%
1.1%
17.2%
6.1%
17.8%
16.1%
15.6%
18.9%
0.0%
13.4%
20.0%
ParticipantAdaBoost HumanBoost
Figure 2. Average error rates of each participant,
adaboost-based detection method, and HumanBoost in the
pilot study, in November 2007.
words, we made 10 types of 20 * 9 binary vectors. Under
the same conditions described above, we calculated the
average error rate for each case of HumanBoost.
The results are summarized in Figure 2, where the
gray bars denote the error rate of each participant, the
white bar denotes the average error rate of the
AdaBoost-based detection method, and the black bars
denote that of HumanBoost. The average error rate for
Human-Boost was 13.4%, 19.0% for the participants and
20.0% for the AdaBoost-based detection method. The
lowest false positive rate was 19.6% for HumanBoost,
followed by 28.1% for AdaBoost and 29.7% for the par-
ticipants. The lowest false negative rate was 8.5% for
HumanBoost, followed by 13.5% for AdaBoost, 14.0%
for the participants.
We found that the average error rate of some partici-
pants increased by employing HumanBoost. We ana-
lyzed the assigned weights and found that some heuris-
tics were assigned higher weights than such users’ trust
decision. For instance, participant 9 had labeled three
legitimate sites as phishing sites, whereas the existing
heuristics had labeled these three sites correctly. His trust
detection was therefore inferior to that of existing heuris-
tics and we assumed that this is the reason for the in-
crease in error rate.
5. Follow-up Study
Increasing the number of participants essentially enables
us to generalize the outcome of HumanBoost. In this
section, we explain the two cases of the follow-up stud-
ies performed in 2010. Note that the pilot study was per-
formed in November 2007 and the follow-up studies
were performed in March 2010 and July 2010, therefore
may be difference based on the demographics of the par-
ticipants and substantial media coverage about phishing.
5.1. A Case of the Follow-up Study in March 2010
Our follow-up study had 11 new participants, aged 23 to
30. All were from the Japan Advanced Institute of Sci-
ence and Technology. All were Japanese males, two had
completed their master’s degree in engineering within
the last five years, and the others were master’s degree
students.
Before conducting the follow-up study, we modified
the dataset described in Table 1. Due to the renewal of
PayPal’s website during 2007 - 2010, we updated web-
sites 9 and 20 to mimic the current PayPal login pages.
Particularly, Nanto Bank, website 6 in Table 1, had
changed both the URL and the content of its login page.
Nanto Bank is also not well-known in Ishikawa Prefec-
ture, where the participants of the follow-up study lived.
We therefore changed website 6 to Hokuriku Bank (an-
other Japanese regional bank in Ishikawa). The domain
name of Hokuriku Bank is www2.paweb.answer.or.jp,
the same as Nanto Bank.
In March 2010, invited 11 participants and asked them
to label 20 websites as legitimate or phishing. Different
from the pilot study described in Section 4, we prepared
printed documents to expedite this experiment. Instead of
operating a browser, participants looked at 20 screen
shots of a browser that had just finished rendering each
website. Additionally, showing a browser screen shot is
often used for phishing IQ tests.
The detection results by each participant and each
heuristic are shown in Table 3. A hash mark denotes the
number of websites, P11 - P21 denote the 11 participants,
H1 - H8 denote the eight heuristics, “F” denotes that a
participant or heuristic failed to judge the website, and
the empty block denotes that a participant or heuristic
succeeded in judging correctly. We also calculated the
average error rate for each participant, the AdaBoost-
based detection method, and HumanBoost.
The results are shown in Figure 3, where the gray bars
denote the error rate of each participant, the white bar
denotes the average error rate of the AdaBoost-based
detection method, and the black bars denote that of Hu-
manBoost. The lowest error rate was 10.7% for Human-
Boost, followed by 12.0% for AdaBoost and 31.4% for
the participants. The lowest false positive rate was 15.4%
for AdaBoost, followed by 18.1% for HumanBoost and
39.9% for the participants. The lowest false negative rate
was 6.1% for HumanBoost, followed by 8.4% for
AdaBoost and 25.9% for the participants. In comparison
HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites
Copyright © 2010 SciRes. JILSA
197
Table 3. Detection results by each participant and each heuristic in the follow-up study, in March 2010
Participants Heuristics
# P11 P
12 P13 P14 P15 P16 P17 P18 P19 P20 P21 H1H2 H3 H4 H5 H6 H7 H8
1 - - - - - F - - - FF- - - - - F F -
2 - F - - - - - F- - - - F- FF F F -
3 - - F FF - - - F- - FF- FF F - -
4 F - - - - - - FF- - - - - - - - - -
5 - - F - - - - - - - - - FF- F - F -
6 - - - F- - F- F- - - FFFF F F -
7 - F F FF - FF - - F- - - - - - F F
8 - - F - - F - - F - - - F- FF F F -
9 - - - - - - - - - - - FF- FF F - -
10 F - - - - F - - F- - - FF- - F - -
11 - - F F- - - - F- - FFFFF F - -
12 - - - - - - - - FF- - F- F- - F -
13 - F F - F - - FF - F- F- - - - F -
14 - - - - - F - - FF- FFFFF F F -
15 - F - FF - - - - - - - F- - F F - -
16 - F F F F F F F- F- - FFFF F - -
17 - - F F - F - - F - - - - FFF - F -
18 - - - - - F - - - - - - FFF- - F -
19 F - F FF F - - - - F- F- - - - F -
20 - - F F - F - - F - - - F - FF F F -
P11
P12
P13
P14
P15
P16
P17
P18
P19
P20
P21
Average
AdaBoost
0%10% 20%30%40%50%60% 70%80% 90%100%
15.0%
25.0%
50.0%
45.0%
30.0%
45.0%
15.0%
25.0%
55.0%
20.0%
20.0%
31.4%
11.1%
7.8%
11.1%
11.7%
7.8%
13.9%
11.1%
6.7%
13.9%
13.3%
9.4%
10.7%
12.0%
Participant AdaBoostHumanBoost
Figure 3. Average error rates of each participant,
adaboost-based detection method, and HumanBoost in the
follow-up study, in March 2010
to the pilot study, the average error rate in participants
increased due to the difference in the experimental de-
sign; the pilot study allowed participants to operate a
browser but the follow-up study did not. However, we
observed that HumanBoost achieved higher detection
accuracy.
5.2. A Case of the Follow-up Study in July 2010
In order to collect more users’ PTDs, we recruited par-
ticipants via Internet research company. In this section,
we summarize the results briefly.
Of the recruited 309 participants, 42.4% (131) were
male and 57.6% (178) were female. Age ranged from 16
to 77 years old. 48.2% of participants (149) were office
workers, and 19.7% (61) were households and 5.8% (18)
were students. Of the students, 66.7% (12) were Bache-
lors, 11.1% (2) were high school students, 5.6% (1) was
a master’s degree student. They mainly lived around
Tokyo area. We therefore changed website 6 to Tokyo
Tomin Bank (another Japanese regional bank in Tokyo).
The domain name of Tokyo Tomin Bank is also
www2.paweb.answer.or.jp. The other conditions of this
study are the same as the follow up study described in
Section 5.1. In July 2010, recruited 309 participants
looked at 20 screen shots and judged whether the site
seems to be phishing or legitimate.
Based on the detection results, we also calculated the
average error rate for each participant, the AdaBoost-
based detection method, and HumanBoost. The lowest
error rate was 9.7% for HumanBoost, followed by 10.5%
for AdaBoost and 40.5% for the participants. The lowest
false positive rate was 18.3% for AdaBoost, followed by
19.5% for HumanBoost and 57.4% for the participants.
The lowest false negative rate was 5.5% for HumanBoost,
followed by 7.1% for AdaBoost and 33.2% for the par-
ticipants.
6. Discussion
6.1. Comparative Study with SVM
As mentioned in Section 2, the limited numbers of heu-
ristics is one of the biggest issues on detecting phishing
sites. We attempted to utilize users’ PTDs as a new heu-
HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites
Copyright © 2010 SciRes. JILSA
198
ristic and incorporated it into existing heuristics by using
AdaBoost. Though AdaBoost has two advantages as ex-
plained in Section 3.2, checking whether the PTDs are
useful for other machine learning techniques is neces-
sary.
In this section, we employ Support Vector Machine
(SVM), which is also one of the typical machine learning
techniques for classification. We used SVM to incorpo-
rate heuristics instead of AdaBoost, and calculated the
average error rate in the cases of with and without users’
PTDs. To clarify our explanation, we named the method
of using eight existing heuristics with SVM as
SVM-based detection method. We also named Hu-
manSVM, the method of incorporating PTDs into exist-
ing heuristics.
First, we calculated the average error rates by using
the dataset in the pilot study, as described in Section 4.
The conditions are the same as the pilot study, excepting
a machine learning method. The results showed that the
average error rate for HumanSVM was 14.3% and that
for SVM-based detection method was 21.7%.
Second, we used the dataset in the follow-up study
performed in March 2010, described in Section 5.1. The
results showed that for HumanSVM was 11.4% and that
for SVM-based detection method was 18.3%.
Finally, we calculated the average error rates by using
the dataset in the follow-up study performed in July 2010,
described in Section 5.2. The result showed that the av-
erage error rate for HumanSVM was 11.2%, for SVM-
based detection method was 18.9%.
We observed that the average error rates were de-
creased by using PTDs in the all cases. We also observed
that the average error rates in HumanSVM (14.3%,
11.4%, and 11.2%) were higher than that in HumanBoost
(13.4%, 10.7%, and 9.7%). Albeit HumanBoost per-
formed better than HumanSVM, we assumed that the
utilization of PTDs is available as a new heuristic for
detecting phishing sites.
6.2. Removing the Bias
In this section, we discuss our plan for removing the bias.
Removing bias is generally important for a partici-
pant-based test. Though we used cross validation, the
presence of bias can still be assumed due to the biased
dataset and/or biased samples.
Especially, we assumed that labeling our prepared
websites was much difficult than labeling the typical
phishing websites and/or legitimate sites. As explained in
Section 4.1, we designed our study referred to the typical
phishing IQ tests. Since the almost of our prepared web-
sites contained traps, participants often failed to label the
sites. These traps also hindered the existing heuristics to
classify websites. It might result the average error rates
remained still higher.
We positioned our laboratory tests as a first step, and
decided to perform a field test in a large-scale manner.
One approach toward field testing is implementing a
HumanBoost-capable phishing prevention system. This
is possible by distributing it as browser-extension with
some form of data collection and getting a large popula-
tion of users to agree to use it.
6.3. Issues on the Implementation of
HumanBoost-Capable Systems
Here we consider some issues that arise in implementing
a HumanBoost-capable system. Imagine if HumanBoost
has been available in phishing-prevention systems.
The HumanBoost-capable system’s weak point is that
always works after the user finishes making a trust deci-
sion. Generally, phishing-prevention systems are to pre-
vent users from visiting phishing sites. Apart from these
systems, HumanBoost requires users to judge if their
confidential information can be input to the site.
Another problem is difficulty in convincing users to
reconsider their trust decisions. When users attempt to
browse a phishing site, typical phishing prevention sys-
tems display an alert message. In HumanBoost, such
messages would be shown after making the trust decision.
If the user relies on his/her trust decision, the Human-
Boost-capable system will not work if the system alerts
correctly.
To solve these problems, the HumanBoost-capable
system should have the ability to cancel the input or the
submission of users’ confidential information, instead of
blocking the phishing site. The system should monitor
such events, e.g., inputting any data to input forms and/or
clicking buttons. The system should also hook these
event handlers not to send any information if the site
deems to be phishing. It is possible to implement such
system as a browser-extension, as mentioned in Section
6.2.
The HumanBoost-capable system should also have an
interface that can expedite users re-making trust deci-
sions. For instance, the system shows an alert window by
interrupting users’ browsing. The alert window should
contain some text which convince user that the reason of
the site seems to be phishing. It is also possible to im-
plement such system as a browser-extension.
7. Conclusions
In this paper, we presented an approach called Human-
Boost to improve the accuracy of detecting phishing sites.
The key concept was utilizing users’ past trust decisions
(PTDs). Since Web users may be required to make trust
decisions whenever they input their personal information
into websites, we considered recording these trust deci-
HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites
Copyright © 2010 SciRes. JILSA
199
sions for learning purposes. We simply assumed that the
record can be described by a binary variable, represent-
ing phishing or not-phishing, and found that the record
was similar to the output of the existing heuristics.
As our pilot study, in November 2007, we invited 10
participants and performed a subject experiment. The
participants browsed 14 simulated phishing sites and six
legitimate sites, and judge whether or not the site ap-
peared to be a phishing site. We utilized participants’
trust decisions as a new heuristic and we let AdaBoost
incorporate it into eight existing heuristics.
The results showed that the average error rate for Hu-
manBoost was 13.4%, whereas that of participants was
19.0% and that for AdaBoost was 20.0%. We also con-
ducted the follow-up study in March 2010. This study
invited 11 participants, and was performed in the same
fashion of the pilot study. The results showed that the
average error for HumanBoost was 10.7%, whereas that
of participants was 31.4%, and that for AdaBoost was
12.0%. Finally, we invited 309 participants and per-
formed the follow-up study in July 2010. The results
showed that the average error rate for HumanBoost was
9.7%, whereas that of participants was 40.5% and for
AdaBoost was 10.5%. We therefore concluded that PTDs
are available as new heuristics and HumanBoost has the
potential to improve detection accuracy for Web user.
We then checked if PTDs are useful for another ma-
chine learning-based detection method. For a case study,
we employed SVM and measured detection accuracy in
the cases of with and without PTDs. The results showed
that the utilizing PTDs increased the detection accuracy.
We therefore concluded that the PTDs are available as
new heuristics and HumanBoost has the potential to im-
prove detection accuracy for Web user.
8. Acknowledgements
We thank to members of the Internet Engineering Labo-
ratory at the Nara Institute Science and Technology and
Shinoda Laboratory at the Japan Advanced Institute of
Science and Technology for attending our experiments.
REFERENCES
[1] T. McCall and R. Moss, “Gartner Survey Shows Frequent
Data Security Lapses and Increased Cyber Attacks Dam-
age Consumer Trust in Online Commerce,” 2005.
http://www.gartner.com/press_releases/asset_129754 _11.
html
[2] C. Pettey and H. Stevens, “Gartner Says Number of
Phishing Attacks on U.S. Consumers Increased 40 Per-
cent in 2008,” April 2009. http://www.gartner.com/it/
page.jsp?id=936913
[3] Anti-Phishing Working Group, “Phishing Activity Trends
Report-Q1, 2008,” August 2008. http://www.apwg.
com/reports/apwgreport_Q1_2008.pdf, 0
[4] Y. Zhang, S. Egelman, L. Cranor and J. Hong, “Phinding
Phish: Evaluating Anti-Phishing Tools,” Proceedings of
the 14th Annual Network and Distributed System Security
Symposium, USA, February 2007.
[5] D. Miyamoto, H. Hazeyama and Y. Kadobayashi, “An
Evaluation of Machine Learning-Based Methods for De-
tection of Phishing Sites,” Australian Journal of Intelli-
gent Information Processing Systems, Vol. 10, No. 2,
2008, pp. 54-63.
[6] I. Fette, N. Sadeh and A. Tomasic, “Learning to Detect
Phishing Emails,” Proceedings of the 16th International
Conference on World Wide Web, Canada, May 2007, pp.
649-656.
[7] S. Abu-Nimeh, D. Nappa, X. Wang and S. Nair, “A
Comparison of Machine Learning Techniques for Phish-
ing Detection,” Proceedings of the 2nd Annual An-
ti-Phishing Working Groups eCrime Researchers Summit,
USA, October 2007, pp. 60-69.
[8] R. Basnet, S. Mukkamala and A. H. Sung, “Detection of
Phishing Attacks: A Machine Learning Approach,” Stud-
ies in Fuzziness and Soft Computing, Vol. 226, February
2008, pp. 373-383.
[9] Y. Pan and X. Ding, “Anomaly Based Web Phishing
Page Detection,” Proceedings of the 22nd Annual Com-
puter Security Applications Conference on Annual Com-
puter Security Applications Conference, USA, September
2006, pp. 381-392.
[10] Y. Zhang, J. Hong and L. Cranor, “CANTINA: A Con-
tent-Based Approach to Detect Phishing Web Sites,”
Proceedings of the 16th World Wide Web Conference,
China, May 2007, pp. 639-648.
[11] Open DNS, “Phishtank-Join the Fight against Phishing,”
http://www.phishtank.com.
[12] Y. Freund and R. E. Schapire, “A Decision-Theoretic
Generalization of On-Line Learning and an Application
to Boosting,” Journal of Computer and System Science,
Vol. 55, No. 1, August 1997, pp. 119-137.
[13] R. Dhamija, J. D. Tygar and M.A. Hearst, “Why Phishing
Works,” Proceedings of Conference on Human Factors
in Computing Systems, April 2006, pp. 581-590.
[14] A. Y. Fu, X. Deng, L. Wenyin and G. Little, “The Meth-
odology and an Application to Fight against Unicode At-
tacks,” Proceedings of the 2nd Symposium on Usable
Privacy and Security, USA, July 2006, pp. 91-101.
[15] T. A. Phelps and R. Wilensky, “Robust Hyperlinks:
Cheap, Everywhere, Now,” Proceedings of the 8th Inter-
national Conference on Digital Documents and Elec-
tronic Publishing, September 2000, pp. 28-43.