HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites

doi:10.4236/jilsa.2010.24022

Paper Menu >>

Journal Menu >>

Journal of Intelligent Learning Systems and Applications, 2010, 2, 190-199

doi:10.4236/jilsa.2010.24022 Published Online November 2010 (http://www.scirp.org/journal/jilsa)

HumanBoost: Utilization of Users’ Past Trust

Decision for Identifying Fraudulent Websites

Daisuke Miyamoto1, Hiroaki Hazeyama2, Youki Kadobayashi2

1Information Security Research Center, National Institute of Information and Communications Technology, Koganei, Tokyo, Japan;

2Graduate School of Information Science, Nara Advanced Institute of Science and Technology, Ikoma, Nara, Japan.

E-mail: daisu-mi@nict.go.jp, hiroa-ha@is.naist.jp, youki-k@is.aist-nara.ac.jp

Received March 25th, 2010; revised August 25th, 2010; accepted September 16th, 2010.

ABSTRACT

This paper presents HumanBoost, an approach that aims at improving the accuracy of detecting so-called phishing

sites by utilizing users’ past trust decisions (PTDs). Web users are generally required to make trust decisions whenever

their personal information is requested by a website. We assume that a database of user PTDs would be transformed

into a binary vector, representing phishing or not-phishing, and the binary vector can be used for detecting phishing

sites, similar to the existing heuristics. For our pilot study, in November 2007, we invited 10 participants and performed

a subject experiment. The participants browsed 14 simulated phishing sites and six legitimate sites, and judged whether

or not the site appeared to be a phishing site. We utilize participants’ trust decisions as a new heuristic and we let

AdaBoost incorporate it into eight existing heuristics. The results show that the average error rate for HumanBoost was

13.4%, whereas for participants it was 19.0% and for AdaBoost 20.0%. We also conducted two follow-up studies in

March 2010 and July 2010, observed that the average error rate for HumanBoost was below the others. We therefore

conclude that PTDs are available as new heuristics, and HumanBoost has the potential to improve detection accuracy

for Web user.

Keywords: Phishing, Personalization, AdaBoost, Trust Decision

1. Introduction

Phishing is a form of identity theft in which the targets

are users rather than computer systems. A phishing at-

tacker attracts victims to a spoofed website, a so-called

phishing site, and attempts to persuade them to provide

their personal information. Damage suffered from

phishing is increasing. In 2005, the Gartner Survey re-

ported that 1.2 million consumers lost $929 million as a

result of phishing attacks [1]. The modern survey con-

ducted in 2008 also reported that more than 5 million

consumers lost $1.76 billion [2]. The number of phishing

sites is also increasing. According to trend reports pub-

lished by the Anti-Phishing Working Group [3], the

number of the reported phishing sites was 25,630 in

March 2008, far surpassing the 14,315 in July 2005.

To deal with phishing attacks, a heuristics-based de-

tection method has begun to garner attention. A heuristic

is an algorithm to identify phishing sites based on users’

experience, and checks whether a site appears to be a

phishing site. Checking the life time duration of the is-

sued website is well-known heuristic as most phishing

sites’ URL expired in short time span. Based on the de-

tection result from each heuristic, the heuristic-based

solution calculates the likelihood of a site being a phish-

ing site and compares the likelihood with the defined

discrimination threshold. Unfortunately, the detection

accuracy of existing heuristic-based solutions is nowhere

near suitable for practical use [4] even though there ex-

ists various heuristics discovered by former studies. In

our previous work [5], we attempted to improve this ac-

curacy by employing machine learning techniques for

combining heuristics, since we assumed that the inaccu-

racy is caused by heuristics-based solutions that cannot

use the heuristics appropriately. In most cases, machine

learning-based detection methods (MLBDMs) performed

better than existing detection methods. Especially, an

AdaBoost-based detection method showed the highest

detection accuracy.

In this paper, we propose HumanBoost, which aims at

improving AdaBoost-based detection methods. The key

concept of HumanBoost is utilizing Web users’ past trust

decisions (PTDs). Basically, humans have the potential

to identify phishing sites, even if existing heuristics can-

HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites

191

not detect them. If we can construct a database of PTDs

for each Web user, we can use the record of the user’s

trust decisions as a feature vector for detecting phishing

sites. HumanBoost also involves the idea of adjusting the

detection for each Web user. If a user is a security expert,

the most predominant factor on detecting phishing sites

would be his/her trust decisions. Conversely, the existing

heuristic will have a strong effect on detection when the

user is a novice and his/her PTD has often failed.

In our study in November 2007, we invited 10 partici-

pants and performed a subject experiment. The partici-

pants browsed 14 simulated phishing sites and six le-

gitimate sites, and judged whether or not the site ap-

peared to be a phishing site. By utilizing participants’

trust decisions as a new weak-hypothesis, we let

AdaBoost incorporate the heuristic into eight existing

heuristics. The results show that the average error rate

for HumanBoost was 13.4%, whereas that for partici-

pants was 19.0% and for the AdaBoost-based detection

method 20.0%. We then conducted a follow-up study in

March 2010. This study had 11 participants with the al-

most same conditions as the first. The results show that

the average error rate for HumanBoost was 10.7%,

whereas that for participants was 31.4% and for

AdaBoost 12.0%. We also invited 309 participants and

performed another follow-up study in July 2010. The

results show that the average error rate for HumanBoost

was 9.7%, whereas that for participants was 40.5% and

for AdaBoost 10.5%.

The rest of this paper is organized as follows. Section

2 summarizes the related work, and Section 3 explains

our proposal. Section 4 describes our preliminary evalua-

tion, and Section 5 presents a follow-up study. Section 6

discusses the availability of PTDs, the way for removing

bias, and issues on implementing HumanBoost-capable

system. Finally, Section 7 concludes our contribution.

2. Related Work

For mitigating phishing attacks, machine learning, which

facilitates the development of algorithms or techniques

by enabling computer systems to learn, has begun to

garner attention. PFILTER, which was proposed by Fette

et al. [6], employed SVM to distinguish phishing emails

from other emails. Abu-Nimeh et al. compared the pre-

dictive accuracy of six machine learning methods [7].

They analyzed 1,117 phishing emails and 1,718 legiti-

mate emails with 43 features for distinguishing phishing

emails. Their research showed that the lowest error rate

was 7.72% for Random Forests. Ram Basnet et al. per-

formed an evaluation of six different machine learn-

ing-based detection methods [8]. They analyzed 973

phishing emails and 3,027 legitimate emails with 12 fea-

tures, and showed that the lowest error rate was 2.01%.

The experimental conditions were differed between them,

but machine learning provided high accuracy for the de-

tection of phishing emails.

Apart from phishing emails, machine learning was

also used to detect phishing sites. Pan et al. presented an

SVM-based page classifier for detecting those websites

[9]. They analyzed 279 phishing sites and 100 legitimate

sites with eight features, and the results showed the av-

erage error rate to be 16%. Our previous work employed

nine machine learning techniques [5], AdaBoost, Bag-

ging, Support Vector Machines, Classification and Re-

gression Trees, Logistic Regression, Random Forests,

Neural Networks, Naïve Bayes, and Bayesian Additive

Regression Trees. We also employed eight heuristics

presented in [10] and analyzed 3,000 URLs, consisting

of 1,500 legitimate sites and the same number of phish-

ing sites, reported on PhishTank.com [11] from Novem-

ber 2007 to February 2008. Our evaluation results

showed the highest f1 measure at 0.8581, lowest error

rate at 14.15% and highest AUC at 0.9342; all of which

were observed for the AdaBoost-based detection method.

In most cases, MLBDMs performed better than the ex-

isting detection method.

Albeit earlier researches used machine learning, we

find that there are two problems. One is the number of

features for detecting phishing sites is less than that for

detecting phishing emails. It indicates that the develop-

ment of new heuristic for phishing sites is more difficult

than that for phishing emails. The other is to lack the

idea of protecting individual Web user. We considered

that the protection methods should differ in each Web

URL Actual

Condition

The user’s

trust decision

Heuristics

#1 … Heuristics

Site 1 Phishing Phishing Phishing … Legitimate

Site 2 Phishing Legitimate Phishing … Phishing

Site 3 Phishing Phishing Legitimate … Phishing

… … … … … …

Site M Legitimate Legitimate Legitimate … Phishing

Figure 1. Example of PTD and its scheme.

HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites

192

user as long as phishing attacks target individual users.

Our proposed HumanBoost aims at using past trust deci-

sions as a new heuristic. It also enables the detection

algorithm to customize for each Web user by machine

learning processes describe in Section 3.2.

3. HumanBoost

3.1. Overview

The key concept of HumanBoost is utilizing Web users’

past trust decisions (PTDs). Web users are generally re-

quired to make trust decisions whenever they input their

personal information into websites. In other words, we

assumed that a Web user outputs a binary variable,

phishing or legitimate, when the website requires users to

input their password. Note that existing heuristics for

detecting phishing sites, which we explain in Section 4.2,

are similar to output binary variables denoting phishing

or not-phishing.

In HumanBoost, we assume that each Web user has

his/her own PTD database, as shown in Figure 1. The

schema of the PTD database consists of the website’s

URL, actual conditions, the result of the user’s trust de-

cision, and the results from existing heuristics. Note that

we do not propose sharing the PTD database among us-

ers due to the privacy concerns. The PTD database can

be regarded as a training dataset that consists of N + 1

binary explanatory variables and one binary response

variable. We, therefore, employ a machine learning tech-

nique for studying this binary vector for each user’s PTD

database.

3.2. Theoretical Background

In this study we employ the Adaptive Boosting

(AdaBoost) [12] algorithm that learns a strong algorithm

by combining a set of weak algorithms ht and a set of

weight αt :

ttada hH



  (1)

The weights are learned through supervised training

off-line. Formally, AdaBoost uses a set of input data { xi,

yi : i = 1, … , m} where xi is the input and yi is the classi-

fication.

Each weak algorithm is only required to make the

correct detections in slightly over half the time. The

AdaBoost algorithm iterates the calculation of a set of

weight Dt (i) on the samples. At t = 1, the samples are

equally weighted so Dt (i) = 1/m.

The update rule consists of three stages. First,

AdaBoost chooses the weight as shown in (2).



















1 (2)

where ε t = Pri~Di [ht (xi) ≠ yi]. Second, AdaBoost updates

the weights by (3).

















iit

tyxhe

yxhe

)( if

)(



(3)

where Z t is a normalization, factor, ∑ Dt+1 (i) = 1. Finally,

it outputs the final hypothesis as shown in (1).

We have two reasons of employing AdaBoost. One is

that it had performed better in our previous comparative

study [5], where it demonstrated the lowest error rate, the

highest f1 measure, and the highest AUC of the

AdaBoost-based detection method, as mentioned in Sec-

tion 2.

The other is that we expect AdaBoost to cover each

user’s weak points. Theoretically, the boosting algorithms

assign high weight to a classifier that correctly labels a site

that other classifiers had labeled incorrectly, as shown in

(3). Assuming that a user’s trust decision can be treated as

a classifier, AdaBoost would cover users’ weak points by

assigning high weights to heuristics that can correctly

judge a site that the user is likely to misjudge.

4. Experiment and Results

To check the availability of PTDs, we invited partici-

pants and performed a phishing IQ test to construct PTDs,

in November 2007. This section describes the dataset

description of the phishing IQ test, introduces the heuris-

tics that we used, and then explains our experimental

design and finally show the results.

4.1. Dataset Description

Similar to the typical phishing IQ tests performed by

Dhamija et al. [13], we prepared 14 simulated phishing

sites and six legitimate ones, all of which contained Web

forms in which users could input their personal informa-

tion such as user ID and password. The conditions of the

sites are shown in Table 1.

Website 1, 4, 7, 12, 13, 19 were actual company web-

sites, but these sites contained defective features that

could mislead participants into labeling them as phishing.

Websites 1 and 12 required users to input their password,

though they employed no SSL certification. Website 4

was Goldman Sachs with the domain name webid

2.gs.com. Since “gs” can imply multiple meanings, the

domain name can confuse participants. Similarly, web-

site 13 contained “clientserv” in its domain name. Web-

site 7 was Nanto Bank, a Japanese regional bank mainly

operating in Nara Prefecture, where almost all the par-

ticipants lived, but its domain name www2.paweb. an-

ser.or.jp which gives no indication of the bank’s name.

Website 19 was Apple Computer Inc. and employed a

valid SSL, but web browsers displayed an alert window

because of its accessing non-SSL content.

HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites

193

The rests were phishing sites. Websites 5, 11, and 15

were derived from actual phishing sites based on a report

from Phishtank.com. Other phishing sites were simulated

phishing sites that mimic actual websites by using the

phishing practices described in the followed sections.

4.1.1. Confusing URL

Websites 2, 6, 16, and 17 were made to look like well-

known sites, but with slightly different or confusing

URLs. A phishing attacker (phisher) registering a similar

or otherwise legitimate-sounding domain name such as

www-bk-mufg.jp is increasingly common. Website 6 was

hosted at www.bankofthevvest.com, with two “v”s in-

stead of a “w” in its domain name. According to the

phishing IQ test conducted by Dhamija [13], the phishing

site that fooled the most participants was an exact replica

of the Bank of the West homepage and hosted at this

domain name.

4.1.2. IP Address Abuse

Websites 10 and 14 employed IP address abuse; instead

of showing the domain name, the IP address appears in

the browsers’ address bar. For website 10, a phisher

copied the contents of the actual Citibank homepage into

a website and created URLs using IP addresses. The IP

address does not point Citibank, but some participants

would not be aware of this and think the site is legiti-

mate.

4.1.3. IDN Abuse

Websites 9 and 18 employed International Domain Name

(IDN) abuse, modern phishing technique. Fu et al. indi-

cated [14] that the letter “а” in the Cyrillic alphabet is

instance, the URL of website 9 is www.xn--pypal-4ve.

com, which is clearly different from www.paypal.com.

Yet, the domain name can be shown as a www.pаypal.

com in web browsers.

Table 1. Conditions of each website.

# Website Real /

Spoof Lang Description

1 Live.com real EN URL (login.live.com)

2 Tokyo-Mitsubishi UFJ spoof JP URL (www-bk-mufg.jp),

similar to the legitimate URL (www.bk.mufg.jp)

3 PayPal spoof EN URL (www.paypal.com.%73%69 ... %6f%6d)

(URL Encoding Abuse)

4 Goldman Sachs real EN URL (webid2.gs.com), SSL

5 Natwest Bank spoof EN URL (onlinesession-0815.natwest.com.esb6eyond.gz.cn),

derived from PhishTank.com

6 Bank of the West spoof EN URL (www.bankofthevvest.com), similar to the legitimate URL

(www.bankofthewest.com)

7 Nanto Bank real JP URL (www2.paweb.anser.or.jp), SSL, third party URL

8 Bank of America spoof EN URL (bankofamerica.com@index.jsp-login-page.com)

(URL Scheme Abuse)

9 PayPal spoof EN URL (www.paypal.com), first “a” letter is a Cyrillic small letter

“а” (U+430) (IDN Abuse)

10 Citibank spoof EN URL (IP address) (IP Address Abuse)

11 Amazon spoof EN URL (www.importen.se), contains “amazon” in its path, derived

from PhishTank.com

12 Xanga real EN URL (www.xanga.com)

13 Morgan Stanley real EN URL (www.morganstanleyclientserv.com), SSL

14 Yahoo spoof EN URL (IP address) (IP Address Abuse)

15 U.S.D. of the Treasury spoof EN URL (www.tarekfayed.com) , derived from PhishTank.com

16 Sumitomo Mitsui Card spoof JP URL (www.smcb-card.com), similar to the legitimate URL

(www.smbc-card.com)

17 eBay spoof EN URL (secuirty.ebayonlineregist.com)

18 Citibank spoof EN

URL (

シテイバンク

.com), is pronounced “Shi Tee Ban Ku”,

look-alike “Citibank” in Japanese Letter)

(IDN Abuse)

19 Apple real EN URL (connect.apple.com), SSL, popup warning by accessing

non-SSL content

20 PayPal spoof EN URL (www.paypal.com@verisign-registered.com),

(URL Scheme Abuse)

HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites

194

4.1.4. URL Scheme Abuse

The URLs of websites 8 and 20 contained an at-mark

(@) symbol. When the symbol is used in a URL, all text

before it is ignored and the browser references only the

information following it as a hostname. For website 8,

the URL is

http://bankofamerica.com@index.jsp-login-page.com.

Even if it seemed like bankofamerica.com, web browsers

would ignore this and would be directed to

index.jsp-login-page.com.

4.1.5. URL Encoding Abuse

URL encoding is an accepted method of representing

characters within a URL that may need special syntax

handling to be correctly interpreted. This is achieved by

encoding the character to be interpreted with a sequence

of three characters. This triplet sequence consists of the

percent character “%” followed by the two hexadecimal

digits representing the octet code of the original charac-

ter. For instance, the US-ASCII character set represents a

letter “s” with hexadecimal code 73, so its URL-encoded

representation is %73. Website 3 glossed over its domain

name by URL encoding abuse to make the domain name

mimic that of PayPal, Inc.

4.2. Heuristics

Our experiment employs eight types of heuristics, all of

which were employed by CANTINA [15]. To the best of

our knowledge, CANTINA is the most successful tool

for combining heuristics, since it has achieved high ac-

curacy in detecting phishing sites without using the URL

blacklist.

4.2.1. Age of Domain (H1)

This is a check of whether the domain was registered

more than 12 months ago. If it was, the heuristic deems it

a legitimate site. Otherwise it deems it a phishing site. A

shortcoming of this heuristic was that newly created le-

gitimate sites are not registered in one year. In this case,

the heuristic will fail. Another shortcoming is that do-

main names of many phishing sites were in fact regis-

tered over a year ago. Especially, modern phishing sites

are often discovered on a host owned by legitimate

company. Some vulnerability in that host allowed a

phisher to penetrate it and set up a phishing sites. In such

cases, the domain name was often registered long time

ago, and thus, the heuristic fails to classify it correctly.

4.2.2. Known Images (H2)

This is a check of whether a page contains inconsistent

use of well-known logos such as those of eBay, PayPal,

Citibank, Bank of America, Fifth Third Bank, Barclays

Bank, ANZ Bank, Chase Bank, and Wells Fargo Bank.

For instance, if a site contains eBay logos but is not on

an eBay domain, the heuristic deems it a phishing site.

However, the function of pattern-matching in a digitized

image might lead to many misjudgments. In the other

case, this heuristic also fails when legitimate sites em-

ploy these logo files. Even if a company has a business

relationship with PayPal and uses the PayPal logo in its

website, the heuristics labels this as a phishing site.

4.2.3. Suspicious URL (H3)

This is a check of whether the site URL contains an

at-mark (@) symbol or a hyphen (-) in the domain name.

If so, the heuristic deems it a phishing site because

phishing attackers are likely to use these symbols in their

domain name of a phishing site. The weakness of the

heuristics are that some legitimate sites, (e.g., aist-

nara.ac.jp), use a hyphen in their domain name. Several

phishing sites also do not have an at-mark or a hyphen in

their domain name.

4.2.4. Suspicious Links (H4)

Similar to the Suspicious URL heuristic, this one checks

if a link on the page contains an at-mark or a hyphen.

The weak points of this heuristic are same as those of the

Suspicious URL heuristic.

4.2.5. IP Address (H5)

This is a check of whether the domain name of the site is

an IP address. Though legitimate sites rarely link to

pages via an IP address, phishers often attract victims to

phishing sites by IP address links. The heuristic fails if

the URL of a phishing site uses a fully qualified domain

name, or that of a legitimate site is an IP address.

4.2.6. Dots in URL (H6)

This is a check of whether the URL of the site contains

five or more dots. According to Fette et al. [6], dots can

be abused for attackers to construct legitimate-looking

URLs. One technique is to have a sub-domain. Another

is to use a redirection script, which to the user may, for

instance, appear like a site hosted at google.com, but in

reality will redirect the browser to phishing.com. In both

of these examples, either by the inclusion of a URL into

an open redirect script or by the use of a number of

sub-domains, there are a large number of dots in the

URL. The heuristic fails if there are fewer than five dots

in the URL of a phishing site. For instance, a phishing

site, which was reported November 2008 and placed at

http://kitevolution.com/os/chat6/plugins/safehtml/www.p

aypal.com/canada/cgi-bin/webscr.php?cmd=_login-run,

includes only four dots. Conversely, the URLs of some

legitimate sites can have five or more dots.

4.2.7. Forms (H7)

This is a check of whether the page contains web input

HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites

195

forms. It scans the HTML for <input> tags that accept

text and are accompanied by labels such as “credit card”

and “password”. If this is the case, the heuristic deems it

a phishing site. Unfortunately, this heuristic fails in la-

beling whenever phishers uses digital images of such

words rather than using actual text.

4.2.8. TF-IDF-Final (H8)

This is a check of whether the site is phishing by em-

ploying TF-IDF-Final, an extension of the Robust Hy-

perlinks algorithm [14]. When the heuristic attempts to

identify phishing sites, it feeds the mixture of word lexi-

cal signatures and the domain name of the current web

site into Google. If the domain name matches the domain

name of the top 30 search results, the web site is labeled

legitimate. Some phishing sites, however, can be made to

rank more highly in search results by manipulation of the

search result page.

4.3. Experimental Design

We used a within-subjects design, where every partici-

pant saw every website and judged whether or not it ap-

peared to be a phishing site. In our test we asked 10 par-

ticipants to freely browse the websites. Each partici-

pant’s PC was in-stalled with Windows XP and Internet

Explorer (IE) version 6.0 as the browser. Other than con-

figuring IE to display IDN, we installed no security

software and/or anti-phishing toolbars. We also did not

prohibit participants from accessing websites not listed in

Table 1. Some participants therefore inputted several

terms into Google and compared the URL of the site

with the URLs of those listed in Google’s search results.

In this experiment, we used the average error rate as a

performance metric. To average the outcome of each test,

we performed 4-fold cross validation and repeated in 10

times. However, we considered that the experiment in-

volved a small, homogeneous test population; therefore it

would be difficult to generalize the results toward typical

phishing victims. We will discuss our plan for removing

the bias in Section 6.

4.4. Experiment Results

First, we invited 10 participants, all Japanese males, from

the Nara Institute of Science and Technology. Three had

completed their master’s degree in engineering within

the last five years, and the others were master’s degree

students. We let participants to label the websites de-

scribed in Table 1. The results by each participant are

shown in Table 2. A hash mark (#) denotes the number

of websites in Table 1, P1 - P10 denote the 10 participants,

“F” denotes that a participant failed to judge the website,

and the empty block denotes that a participant succeeded

in judging it correctly.

Next, we determined the detection accuracy of the

AdaBoost-based detection method. We used eight heu-

ristics and outputted a binary variable representing

phishing or not-phishing. The detection results by each

heuristic are shown in Table 2, where H1 - H8 denote

eight heuristics in which numbers are correspond to Sec-

tion 4.2.

Finally, we measured the detection accuracy of Hu-

manBoost. We constructed 10 PTD databases. In other

Table 2. Detection results by each participant and heuristic, in November 2007.

Participants Heuristics

# P1 P

2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P10 H1H2 H 3 H 4 H 5 H 6 H 7 H 8

1 - - - - - - FF- - - - - - - F - -

2 F F - - - - - - - - - - - FF F F -

3 - F - - - - - - - - FF- FF F - -

4 - - F - - F - - F- - F- - - - - -

5 - - - - - - FF- - - - F - F - F -

6 - F F - F F - - - - - FF FF F F -

7 F F - F - F - - - - - F- - - - F F

8 - - - - - - - - - - - - - FF F F -

9 - - - - - - - - - - FF- FF F - -

10 - - - F - - - - - - - FF- - F - -

11 - - - - - - F- - - FFF F F F - -

12 - - - - - - F- F- - F- - - - F F

13 - F F - - - - - - - - - - - - - F -

14 - - - - - - F- - F- F FF F F -

15 - - - F - - - - - - - F- - F F - -

16 - F - F - - - F - - FFF F F F - -

17 - F - F - - - F - - - FFFF - F -

18 - - - - - - - - - - FF - - F -

19 F - - F - - FFF - - F- - - - F -

20 - - F - - - - - - -

- - - FF F F -

HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites

196

P10

Average

AdaBoost

0%10%20% 30%40%50% 60%70%80%90%100%

15.0%

35.0%

20.0%

30.0%

5.0%

15.0%

25.0%

30.0%

15.0%

0.0%

19.0%

19.4%

21.7%

1.1%

17.2%

6.1%

17.8%

16.1%

15.6%

18.9%

0.0%

13.4%

20.0%

ParticipantAdaBoost HumanBoost

Figure 2. Average error rates of each participant,

adaboost-based detection method, and HumanBoost in the

pilot study, in November 2007.

words, we made 10 types of 20 * 9 binary vectors. Under

the same conditions described above, we calculated the

average error rate for each case of HumanBoost.

The results are summarized in Figure 2, where the

gray bars denote the error rate of each participant, the

white bar denotes the average error rate of the

AdaBoost-based detection method, and the black bars

denote that of HumanBoost. The average error rate for

Human-Boost was 13.4%, 19.0% for the participants and

20.0% for the AdaBoost-based detection method. The

lowest false positive rate was 19.6% for HumanBoost,

followed by 28.1% for AdaBoost and 29.7% for the par-

ticipants. The lowest false negative rate was 8.5% for

HumanBoost, followed by 13.5% for AdaBoost, 14.0%

for the participants.

We found that the average error rate of some partici-

pants increased by employing HumanBoost. We ana-

lyzed the assigned weights and found that some heuris-

tics were assigned higher weights than such users’ trust

decision. For instance, participant 9 had labeled three

legitimate sites as phishing sites, whereas the existing

heuristics had labeled these three sites correctly. His trust

detection was therefore inferior to that of existing heuris-

tics and we assumed that this is the reason for the in-

crease in error rate.

5. Follow-up Study

Increasing the number of participants essentially enables

us to generalize the outcome of HumanBoost. In this

section, we explain the two cases of the follow-up stud-

ies performed in 2010. Note that the pilot study was per-

formed in November 2007 and the follow-up studies

were performed in March 2010 and July 2010, therefore

may be difference based on the demographics of the par-

ticipants and substantial media coverage about phishing.

5.1. A Case of the Follow-up Study in March 2010

Our follow-up study had 11 new participants, aged 23 to

30. All were from the Japan Advanced Institute of Sci-

ence and Technology. All were Japanese males, two had

completed their master’s degree in engineering within

the last five years, and the others were master’s degree

students.

Before conducting the follow-up study, we modified

the dataset described in Table 1. Due to the renewal of

PayPal’s website during 2007 - 2010, we updated web-

sites 9 and 20 to mimic the current PayPal login pages.

Particularly, Nanto Bank, website 6 in Table 1, had

changed both the URL and the content of its login page.

Nanto Bank is also not well-known in Ishikawa Prefec-

ture, where the participants of the follow-up study lived.

We therefore changed website 6 to Hokuriku Bank (an-

other Japanese regional bank in Ishikawa). The domain

name of Hokuriku Bank is www2.paweb.answer.or.jp,

the same as Nanto Bank.

In March 2010, invited 11 participants and asked them

to label 20 websites as legitimate or phishing. Different

from the pilot study described in Section 4, we prepared

printed documents to expedite this experiment. Instead of

operating a browser, participants looked at 20 screen

shots of a browser that had just finished rendering each

website. Additionally, showing a browser screen shot is

often used for phishing IQ tests.

The detection results by each participant and each

heuristic are shown in Table 3. A hash mark denotes the

number of websites, P11 - P21 denote the 11 participants,

H1 - H8 denote the eight heuristics, “F” denotes that a

participant or heuristic failed to judge the website, and

the empty block denotes that a participant or heuristic

succeeded in judging correctly. We also calculated the

average error rate for each participant, the AdaBoost-

based detection method, and HumanBoost.

The results are shown in Figure 3, where the gray bars

denote the error rate of each participant, the white bar

denotes the average error rate of the AdaBoost-based

detection method, and the black bars denote that of Hu-

manBoost. The lowest error rate was 10.7% for Human-

Boost, followed by 12.0% for AdaBoost and 31.4% for

the participants. The lowest false positive rate was 15.4%

for AdaBoost, followed by 18.1% for HumanBoost and

39.9% for the participants. The lowest false negative rate

was 6.1% for HumanBoost, followed by 8.4% for

AdaBoost and 25.9% for the participants. In comparison

HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites

197

Table 3. Detection results by each participant and each heuristic in the follow-up study, in March 2010

Participants Heuristics

# P11 P

12 P13 P14 P15 P16 P17 P18 P19 P20 P21 H1H2 H3 H4 H5 H6 H7 H8

1 - - - - - F - - - FF- - - - - F F -

2 - F - - - - - F- - - - F- FF F F -

3 - - F FF - - - F- - FF- FF F - -

4 F - - - - - - FF- - - - - - - - - -

5 - - F - - - - - - - - - FF- F - F -

6 - - - F- - F- F- - - FFFF F F -

7 - F F FF - FF - - F- - - - - - F F

8 - - F - - F - - F - - - F- FF F F -

9 - - - - - - - - - - - FF- FF F - -

10 F - - - - F - - F- - - FF- - F - -

11 - - F F- - - - F- - FFFFF F - -

12 - - - - - - - - FF- - F- F- - F -

13 - F F - F - - FF - F- F- - - - F -

14 - - - - - F - - FF- FFFFF F F -

15 - F - FF - - - - - - - F- - F F - -

16 - F F F F F F F- F- - FFFF F - -

17 - - F F - F - - F - - - - FFF - F -

18 - - - - - F - - - - - - FFF- - F -

19 F - F FF F - - - - F- F- - - - F -

20 - - F F - F - - F - - - F - FF F F -

P11

P12

P13

P14

P15

P16

P17

P18

P19

P20

P21

Average

AdaBoost

0%10% 20%30%40%50%60% 70%80% 90%100%

15.0%

25.0%

50.0%

45.0%

30.0%

45.0%

15.0%

25.0%

55.0%

20.0%

31.4%

11.1%

7.8%

11.1%

11.7%

7.8%

13.9%

11.1%

6.7%

13.9%

13.3%

9.4%

10.7%

12.0%

Participant AdaBoostHumanBoost

Figure 3. Average error rates of each participant,

adaboost-based detection method, and HumanBoost in the

follow-up study, in March 2010

to the pilot study, the average error rate in participants

increased due to the difference in the experimental de-

sign; the pilot study allowed participants to operate a

browser but the follow-up study did not. However, we

observed that HumanBoost achieved higher detection

accuracy.

5.2. A Case of the Follow-up Study in July 2010

In order to collect more users’ PTDs, we recruited par-

ticipants via Internet research company. In this section,

we summarize the results briefly.

Of the recruited 309 participants, 42.4% (131) were

male and 57.6% (178) were female. Age ranged from 16

to 77 years old. 48.2% of participants (149) were office

workers, and 19.7% (61) were households and 5.8% (18)

were students. Of the students, 66.7% (12) were Bache-

lors, 11.1% (2) were high school students, 5.6% (1) was

a master’s degree student. They mainly lived around

Tokyo area. We therefore changed website 6 to Tokyo

Tomin Bank (another Japanese regional bank in Tokyo).

The domain name of Tokyo Tomin Bank is also

www2.paweb.answer.or.jp. The other conditions of this

study are the same as the follow up study described in

Section 5.1. In July 2010, recruited 309 participants

looked at 20 screen shots and judged whether the site

seems to be phishing or legitimate.

Based on the detection results, we also calculated the

average error rate for each participant, the AdaBoost-

based detection method, and HumanBoost. The lowest

error rate was 9.7% for HumanBoost, followed by 10.5%

for AdaBoost and 40.5% for the participants. The lowest

false positive rate was 18.3% for AdaBoost, followed by

19.5% for HumanBoost and 57.4% for the participants.

The lowest false negative rate was 5.5% for HumanBoost,

followed by 7.1% for AdaBoost and 33.2% for the par-

ticipants.

6. Discussion

6.1. Comparative Study with SVM

As mentioned in Section 2, the limited numbers of heu-

ristics is one of the biggest issues on detecting phishing

sites. We attempted to utilize users’ PTDs as a new heu-

HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites

198

ristic and incorporated it into existing heuristics by using

AdaBoost. Though AdaBoost has two advantages as ex-

plained in Section 3.2, checking whether the PTDs are

useful for other machine learning techniques is neces-

sary.

In this section, we employ Support Vector Machine

(SVM), which is also one of the typical machine learning

techniques for classification. We used SVM to incorpo-

rate heuristics instead of AdaBoost, and calculated the

average error rate in the cases of with and without users’

PTDs. To clarify our explanation, we named the method

of using eight existing heuristics with SVM as

SVM-based detection method. We also named Hu-

manSVM, the method of incorporating PTDs into exist-

ing heuristics.

First, we calculated the average error rates by using

the dataset in the pilot study, as described in Section 4.

The conditions are the same as the pilot study, excepting

a machine learning method. The results showed that the

average error rate for HumanSVM was 14.3% and that

for SVM-based detection method was 21.7%.

Second, we used the dataset in the follow-up study

performed in March 2010, described in Section 5.1. The

results showed that for HumanSVM was 11.4% and that

for SVM-based detection method was 18.3%.

Finally, we calculated the average error rates by using

the dataset in the follow-up study performed in July 2010,

described in Section 5.2. The result showed that the av-

erage error rate for HumanSVM was 11.2%, for SVM-

based detection method was 18.9%.

We observed that the average error rates were de-

creased by using PTDs in the all cases. We also observed

that the average error rates in HumanSVM (14.3%,

11.4%, and 11.2%) were higher than that in HumanBoost

(13.4%, 10.7%, and 9.7%). Albeit HumanBoost per-

formed better than HumanSVM, we assumed that the

utilization of PTDs is available as a new heuristic for

detecting phishing sites.

6.2. Removing the Bias

In this section, we discuss our plan for removing the bias.

Removing bias is generally important for a partici-

pant-based test. Though we used cross validation, the

presence of bias can still be assumed due to the biased

dataset and/or biased samples.

Especially, we assumed that labeling our prepared

websites was much difficult than labeling the typical

phishing websites and/or legitimate sites. As explained in

Section 4.1, we designed our study referred to the typical

phishing IQ tests. Since the almost of our prepared web-

sites contained traps, participants often failed to label the

sites. These traps also hindered the existing heuristics to

classify websites. It might result the average error rates

remained still higher.

We positioned our laboratory tests as a first step, and

decided to perform a field test in a large-scale manner.

One approach toward field testing is implementing a

HumanBoost-capable phishing prevention system. This

is possible by distributing it as browser-extension with

some form of data collection and getting a large popula-

tion of users to agree to use it.

6.3. Issues on the Implementation of

HumanBoost-Capable Systems

Here we consider some issues that arise in implementing

a HumanBoost-capable system. Imagine if HumanBoost

has been available in phishing-prevention systems.

The HumanBoost-capable system’s weak point is that

always works after the user finishes making a trust deci-

sion. Generally, phishing-prevention systems are to pre-

vent users from visiting phishing sites. Apart from these

systems, HumanBoost requires users to judge if their

confidential information can be input to the site.

Another problem is difficulty in convincing users to

reconsider their trust decisions. When users attempt to

browse a phishing site, typical phishing prevention sys-

tems display an alert message. In HumanBoost, such

messages would be shown after making the trust decision.

If the user relies on his/her trust decision, the Human-

Boost-capable system will not work if the system alerts

correctly.

To solve these problems, the HumanBoost-capable

system should have the ability to cancel the input or the

submission of users’ confidential information, instead of

blocking the phishing site. The system should monitor

such events, e.g., inputting any data to input forms and/or

clicking buttons. The system should also hook these

event handlers not to send any information if the site

deems to be phishing. It is possible to implement such

system as a browser-extension, as mentioned in Section

6.2.

The HumanBoost-capable system should also have an

interface that can expedite users re-making trust deci-

sions. For instance, the system shows an alert window by

interrupting users’ browsing. The alert window should

contain some text which convince user that the reason of

the site seems to be phishing. It is also possible to im-

plement such system as a browser-extension.

7. Conclusions

In this paper, we presented an approach called Human-

Boost to improve the accuracy of detecting phishing sites.

The key concept was utilizing users’ past trust decisions

(PTDs). Since Web users may be required to make trust

decisions whenever they input their personal information

into websites, we considered recording these trust deci-

HumanBoost: Utilization of Users’ Past Trust Decision for Identifying Fraudulent Websites

199

sions for learning purposes. We simply assumed that the

record can be described by a binary variable, represent-

ing phishing or not-phishing, and found that the record

was similar to the output of the existing heuristics.

As our pilot study, in November 2007, we invited 10

participants and performed a subject experiment. The

participants browsed 14 simulated phishing sites and six

legitimate sites, and judge whether or not the site ap-

peared to be a phishing site. We utilized participants’

trust decisions as a new heuristic and we let AdaBoost

incorporate it into eight existing heuristics.

The results showed that the average error rate for Hu-

manBoost was 13.4%, whereas that of participants was

19.0% and that for AdaBoost was 20.0%. We also con-

ducted the follow-up study in March 2010. This study

invited 11 participants, and was performed in the same

fashion of the pilot study. The results showed that the

average error for HumanBoost was 10.7%, whereas that

of participants was 31.4%, and that for AdaBoost was

12.0%. Finally, we invited 309 participants and per-

formed the follow-up study in July 2010. The results

showed that the average error rate for HumanBoost was

9.7%, whereas that of participants was 40.5% and for

AdaBoost was 10.5%. We therefore concluded that PTDs

are available as new heuristics and HumanBoost has the

potential to improve detection accuracy for Web user.

We then checked if PTDs are useful for another ma-

chine learning-based detection method. For a case study,

we employed SVM and measured detection accuracy in

the cases of with and without PTDs. The results showed

that the utilizing PTDs increased the detection accuracy.

We therefore concluded that the PTDs are available as

new heuristics and HumanBoost has the potential to im-

prove detection accuracy for Web user.

8. Acknowledgements

We thank to members of the Internet Engineering Labo-

ratory at the Nara Institute Science and Technology and

Shinoda Laboratory at the Japan Advanced Institute of

Science and Technology for attending our experiments.

REFERENCES

[1] T. McCall and R. Moss, “Gartner Survey Shows Frequent

Data Security Lapses and Increased Cyber Attacks Dam-

age Consumer Trust in Online Commerce,” 2005.

http://www.gartner.com/press_releases/asset_129754 _11.

html

[2] C. Pettey and H. Stevens, “Gartner Says Number of

Phishing Attacks on U.S. Consumers Increased 40 Per-

cent in 2008,” April 2009. http://www.gartner.com/it/

page.jsp?id=936913

[3] Anti-Phishing Working Group, “Phishing Activity Trends

Report-Q1, 2008,” August 2008. http://www.apwg.

com/reports/apwgreport_Q1_2008.pdf, 0

[4] Y. Zhang, S. Egelman, L. Cranor and J. Hong, “Phinding

Phish: Evaluating Anti-Phishing Tools,” Proceedings of

the 14th Annual Network and Distributed System Security

Symposium, USA, February 2007.

[5] D. Miyamoto, H. Hazeyama and Y. Kadobayashi, “An

Evaluation of Machine Learning-Based Methods for De-

tection of Phishing Sites,” Australian Journal of Intelli-

gent Information Processing Systems, Vol. 10, No. 2,

2008, pp. 54-63.

[6] I. Fette, N. Sadeh and A. Tomasic, “Learning to Detect

Phishing Emails,” Proceedings of the 16th International

Conference on World Wide Web, Canada, May 2007, pp.

649-656.

[7] S. Abu-Nimeh, D. Nappa, X. Wang and S. Nair, “A

Comparison of Machine Learning Techniques for Phish-

ing Detection,” Proceedings of the 2nd Annual An-

ti-Phishing Working Groups eCrime Researchers Summit,

USA, October 2007, pp. 60-69.

[8] R. Basnet, S. Mukkamala and A. H. Sung, “Detection of

Phishing Attacks: A Machine Learning Approach,” Stud-

ies in Fuzziness and Soft Computing, Vol. 226, February

2008, pp. 373-383.

[9] Y. Pan and X. Ding, “Anomaly Based Web Phishing

Page Detection,” Proceedings of the 22nd Annual Com-

puter Security Applications Conference on Annual Com-

puter Security Applications Conference, USA, September

2006, pp. 381-392.

[10] Y. Zhang, J. Hong and L. Cranor, “CANTINA: A Con-

tent-Based Approach to Detect Phishing Web Sites,”

Proceedings of the 16th World Wide Web Conference,

China, May 2007, pp. 639-648.

[11] Open DNS, “Phishtank-Join the Fight against Phishing,”

http://www.phishtank.com.

[12] Y. Freund and R. E. Schapire, “A Decision-Theoretic

Generalization of On-Line Learning and an Application

to Boosting,” Journal of Computer and System Science,

Vol. 55, No. 1, August 1997, pp. 119-137.

[13] R. Dhamija, J. D. Tygar and M.A. Hearst, “Why Phishing

Works,” Proceedings of Conference on Human Factors

in Computing Systems, April 2006, pp. 581-590.

[14] A. Y. Fu, X. Deng, L. Wenyin and G. Little, “The Meth-

odology and an Application to Fight against Unicode At-

tacks,” Proceedings of the 2nd Symposium on Usable

Privacy and Security, USA, July 2006, pp. 91-101.

[15] T. A. Phelps and R. Wilensky, “Robust Hyperlinks:

Cheap, Everywhere, Now,” Proceedings of the 8th Inter-

national Conference on Digital Documents and Elec-

tronic Publishing, September 2000, pp. 28-43.