### Paper Menu >>

### Journal Menu >>

The process of modular exponentiation and modular multiplication is essentially very time consuming process

especially when involves exponentiation of very large integers. They can slow down the whole process of de-

cryption and this is non-economical in commercial applications such as smart cards and real-time web commu-

nication which need fast response time. However, various methods have been introduced to speed up both oper-

ations.

First and the most basic method to increase the computational speed of this equation is repeated squaring al-

gorithm. This algorithm is mostly used in power constrained hardware such as smart card due to its low com-

plexity in calculation. However, it was exploited by Kotcher in the earliest timing attack on cryptosystem’s

hardware. Consider the decryption process of RSA.

(mod )

d

PC N≡

Kotcher cleverly observes this process as signal which has been corrupted by the ’noise’ of unknown bits of

d

. The attack emulates the decryption process and captures the calculated time of decryption using chosen pri-

vate keys until each bits of actual

d

is exposed.

Generally, for chosen private keys,

u

and

v

which have size of -bits

011 11

, ,,,0,,,

i inn

u uuuuuu

−+ −

=……

011 11

, ,,,0,,,

i inn

v vvvvvv

−+ −

=……

the only difference between both chosen private keys is bit at point

i

(Assume that both

u

and

v

already

have same arrangement of bits with actual private key,

d

up until position

?

i1−

).

Let

j

C

be the calculation time to complete decryption using the actual private key,

d

. Also let

u

T

be the

calculation time from bit

0

u

to

i

u

using first emulated key,

u

and

v

T

be the calculation time from bit

0

v

to

i

v

using second emulated key,

v

. We assign

( )

,,uvj uv

T CT

′= −

Hence, variance of timing after point i for both

u

and

v

are regarded as

( )

( )

varvarvar( )

juu e

CTT T

′

−= +

and

( )

( )( )

varvar var

jvve

CTTT

′

−= +

where

e

T

is the measurement error. If we assume

u

has the correct bit (i.e.

d

has bit 0 at point

i

), then

()( )

var var

uv

TT

′′

<

due to increased similarities in

u

in terms of bits compared to

v

.

However, this kind of attack is only successful if the implementation uses repeated squaring algorithm. In

more advanced RSA implementations, repeated squaring algorithm is complemented with Montgomery reduc-

tion algorithm for every multiplication operation. Montgomery reduction algorithm, skips modular reduction

A. H. A. Ghafar, M. R. K. Ariffin

3

which uses the ‘expensive’ division operation. Montgomery reduction transforms the multiplication of

(mod )ab N⋅

to

(mod )and(mod )aaRN bbR N

′′

= =

for which we said

a′

and

b′

are in Montgomery form and

2

k

R=

for

k∈

[4]. The algorithm executes

multiplication in

R

reduction where a computer can easily and cheaply calculate because the division of

2

k

in binary is just a simple shift by

k

bits. One good explanation and example of Montgomery reduction can be

read here [5]. The pseudo codes of both algorithms and how they work together are explained in Algorithm 1.

To calculate modular exponentiation, most implementations use the Chinese Remainder Theorem (CRT) ap-

proach. This approach divides the modular exponentiation which is in the form of modular

N pq= ⋅

to the

form of modulo

p

and

q

. Values of

p

and q are normally stored in decryption machine. The logic behind

this approach is because the value of

p

and

q

are smaller than

N

, hence the modular exponentiation will be

more computationally efficient due to their decreased size. To get the original result, the results in modulo

p

and modulo

q

are combined.

Later, Schindler produced a significant observation for RSA implementation that uses repeated squaring, CRT

and Montgomery reduction approach [6]. Schindler observes that the extra reduction in Montgomery algorithm

to ensure

sN

≤

(refer Algorithm 2) caused atiming difference for different

a

and

b

. From this observa-

tion, Schindler derived the probability of an extra reduction occurring for

(mod )

d

p

ap

which is

(mod )

(extra reduction)2

ap

PR

=

(1)

Observation from (1) shows that there is discontinuity after certain number of extra reduction process which

happens to be a multiple of

p

.

When a is approached by values less than a multiple of

p

as shown in Figure 1, number of reduction

Algorithm 1. Repeated squaring algorithm.

Algorithm 2. Montgomery algorithm.

A. H. A. Ghafar, M. R. K. Ariffin

4

Figure 1. Number of extra reductions in a Montgomery re-

duction as function (1) of the input a.

increases. But, right after the multiple of

p

, the number of extra reduction drops significantly. This drop can

help us to derive the multiple of

p

hence factorize

N

.

Another algorithm known as the Karatsuba multiplication is regarded as the most efficient multiplication al-

gorithm of two numbers with the same numbers of digits [7]. This works by assuming that addition is much

cheaper than multiplication operation. The algorithm is explained as follows.

Given that we want to multiply

a

and

b

that have n numbers of digits. We can write

?

1

01

10

n

aa a

−

= +⋅

?

1

01

10

n

bb b

−

= +⋅

Basic arithmetic shows that

( )

()

??

?? ?

??

11

01 01

11

21

0 0011011

121

0 0011011

(10 )(10 )

1010 10

10 10

nn

nn n

nn

abaabb

ab ababab

abab abab

−−

−− −

−−

⋅=+⋅+ ⋅

=+⋅ +⋅ +⋅

= +++⋅

If we define

0 00

z ab=

10110

zab ab= +

2 11

z ab=

It is easy to see that

?210 1010

zaabbzz

=−− −−

And the product of

a

and

b

is

( )

??

121

01 2

1010 .

nn

ab zzz

−−

+

⋅= +⋅⋅

Karatsuba algorithm cuts the work factor

2

()On

in long normal multiplication to only

1.585

()On

.

Both Montgomery multiplication and Karatsuba algorithm have been extensively used in real implementation

of RSA. RSA implementation in OpenSSL used repeated squaring, Montgomery multiplication and sliding

windows while utilized Karatsuba when multiplying a and b that have same size.

There are two vulnerabilities which have been exploited in this attack. First is the extra reduction happened in

Montgomery algorithm. This has been explained in Schindler’s attack while the other is the usage of long

multiplication and Karatsuba algorithm. Both multiplications show different time significantly [8]. Brumley and

Boneh showed how a timing attack launched on RSA implementation in OpenSSL [9]. The attack is so elegant,

it was done remotely between two servers in two different buildings. Every time the Mongomery form of C

A. H. A. Ghafar, M. R. K. Ariffin

5

slightly moves toward less than

p

, the number of extra reductions in the Montgomery reduction (Algorithm 2)

increases. It involves the multiplication of two numbers that are about the same size hence Karatsuba

multiplication is used. Meanwhile, whenever the value of

C

moves to be larger than

p

, the re- duction in

Algorithm 2 will decrease. Long multiplication is used here because the multiplication involves two numbers

that have significant different sizes.

The attack’s aim is to find the binary structure of

p

:

( )

0111 2

,,, ,,,,

ii nn

pbbb bbb

−−

=……

where

01b=

and has size of

n

-bits. This will solve the factorization of

N pq= ⋅

and lead to a total break of

RSA algorithm. The following method is used to recover the bit of

p

at point-

i

:

1) Suppose that bits of

p

have been determined from

0

b

to

1i

b

−

. Let

( )

0 011

, ,,,0,0,0

i

C bbb

−

=……

and

( )

1 011

,,,,1,0, 0

i

C bbb

−

= ……

That is, the only difference between the two chosen ciphertexts is the bit at point-

i

.

0

C

and

1

C

are the

candidates of prime

p

.

2) Assign

0

()TC

and

1

()

TC

as the measured decryption times for

0

C

and

1

C

respectively.

3) Assign

( )

01

()TC TC∆= −

i.e the timing difference between

0

()TC

and

1

()

TC

.

4) If

01

C pC<<

, then

∆

is “large”. This indicates

0

i

b=

. If

01

CCp<<

, then

∆

is “small”. This

indicates

1

i

b=

. The thresholds of “large” and “small” are set by the previous values of

∆

.

5) The previous steps are repeated to obtain bits at points

123

,,,

ii i

bbb

++ +

…

until half of the bits

p

have been

recovered.

After obtaining half of bits of

p

, Coppersmith has extensive methods to recover the whole bits of

p

and

consequently factorizes

N

[10]. Value of

reveals whether extra reductions of Montgomery multiplication

or multiplication (normal versus Karatsuba) predominates during the decryption process at point

i

.

3. AAβ Algorithm

The Rabin cryptosystem utilized the square root modulo problem as its cryptographic primitive [11]. The

alluring factor in Rabin cryptosystem is it has a very small exponent key,

2e=

. But the cryptosystem has a

decryption failure which is caused by the 4-to-1 mapping during decryption. This trade-off deterred Rabin

cryptosystem to be widely used in modern application.

In 2013, Ariffin et al. proposed a new cryptosystem which also utilized the same problem but without any

decryption failure. The algorithm is workable with lower complexity order compared to DHKE, El-Gammal,

RSA and ECC [1]. Algorithms 3-5 show the whole cryptosystem.

The decryption algorithm uses multiplication together with a one time modular reduction. This lack of

expensive operation makes

AA

β

to be an ideal candidate for future public key cryptosystem.

We now analyze the operation given by

(mod )WC dpq≡⋅

Algorithm 3. AAβ key generation.

A. H. A. Ghafar, M. R. K. Ariffin

6

Algorithm 4. AAβ encryption.

Algorithm 5. AAβ dec ryption.

and discuss whether the time calculation of this operation can be exploited. Later, we will enhance the capability

of an attacker to obtain

W

which consequently will continue onto the calculation of

1

4(mod )

p

p

xW p

+

≡

and

1

4

(mod )

q

q

xW q

+

≡

assuming that we can distinguish the operational time for all multiplications that occurs within the decryption

machine. In the next section, we will show how the attack is launched on the

AA

β

decryption algorithm.

4. Timing Attack on AAβ Algorithm

From Section 2, it is obvious that timing attack manipulates the timing difference in exponentiation multiplica-

tion with the aim to find first, the bits of private key bit per bit and second, factor

N pq= ⋅

(i.e. for a public

key cryptosystem which accommodates hard factorization problem as its underlying strength). We have also

observed the multiple stages of the decryption algorithm of

AA

β

(Algorithm 5) in Section 3. Despite its

intricacy, the algorithm is still faster than RSA. Since this paper only discusses the theoretical analysis of timing

attack due to lack of real-world implementation of

AA

β

algorithm, it is reasonable to assume that future

implementation of this algorithm will adhere to the vastly used implementation in terms of multiplication and

A. H. A. Ghafar, M. R. K. Ariffin

7

modular exponentiation. Based on this assumption, we now observe the first multiplication in the

AA

β

decryption algorithm which is

(mod )

WC dpq≡⋅

(2)

Since

7

~2n

C

and

2

~2

n

d

, the inequality in the size of

C

and

d

will not give us a choice to use

Karatsuba reduction in the implementations of

AA

β

. This is because Karatsuba algorithm requires both

multiplication elements to be in the same size to make it efficient as stated in Section 2.

As for other method, Montgomery multiplication, it is also not practical in this multiplication process because

the reduction only works better than long multiplication if there is an exponentiation operation involved.

However Equation (2) does not involve exponentiation.

In short, to compute

Cd⋅

efficiently, one only use long multiplication. Next, we increase the advantage for

the attacker by assuming that value

W

can be manipulated or collected. This loose assumption is realistic due

to misuse implementation or hardware faulty. We also assume that equation