
Privacy Preserving Two-Party Hierarchical Clustering Over Vertically Partitioned Dataset
28
Homomorphic Encryption: Encryption method mainly
resolves the problems that people jointly conduct mining
tasks based on the private inputs they provide. The en-
cryption method can ensure that the transformed data is
exact and secure.
Homomorphic encryption schemes allow certain
computations on encrypted values. In particular, an en-
cryption scheme is additively homomorphic if there is
some operation ⊗ on encryptions such that for all clear-
text values a and b, E(a) ⊗ E(b) = E(a + b).
Let (G, E, D, M) be a homomorphic encryption
scheme, where G is an algorithm for generating keys, E
and D are the encryption and decryption functions, and
M is the message space. It has the following properties:
• (G, E, D) is semantically secure
• E(a1) × E(a2) = E(a1 + a2), for a1, a2 ∈ M.
• E(a1)α = E(α · a1), for a1 ∈ M and α ∈N.[4,7,12]
Secure Scalar Product Protocol: In this scheme the
scalar product or dot product of two vectors is computed
securely. It means that the actual values of the vectors are
not revealed only the result is revealed.
Let a vector X = (x1,...,xn), held by Alice, and Y =
(y1,..., yn), held by Bob. They need to securely compute
the scalar product X ·Y =∑i=1-n (xi*yi). So, at the end of
the protocol Alice knows only X.Y not Y and so for Bob.
[3,7,11]
Secure Add and Compare: It is based on the homo-
morphic encryption. It builds a circuit that has two inputs
from each party, sums the first input of both parties and
the second input of both parties, and returns the result of
comparing the two sums. It is a semantically secure ap-
proach.
Let two parties, P1 and P2, such that P1 has numbers a1
and b1, and P2 has numbers a2 and b2, then this scheme
securely find if a1 + a2 < b1 + b2 without revealing the
following two pieces of information to the other party: (1)
the numbers possessed by each party; and (2) the differ-
ence between a1 + a2 and b1 + b2. [7]
3. Privacy Preserving Clustering Algorithm
3.1. Privacy Preserving Clustering Algorithm
Assume that a data set D consists of n instances with m
attributes and it is vertically partitioned between Alice
and Bob. So, Alice contains her own database with first
m1 attributes and Bob contains his own database with last
m2 attributes, m1+m2=m.
Alice and Bob want to find the final k-clusters over the
total partitioned data but trying to protect their own pri-
vacy. So, both parties learn final k clusters and nothing
else.
In this paper a novel approach of hierarchical cluster-
ing is given for vertically partitioned data set which con-
siders the privacy factor also. The assumptions made
during the process are:
1. Both parties contain the same number of data re-
cords.
2. Alice contains some attributes for all records, Bob
contains the other attributes.
3.2. Clustering on Vertically Partitioned Data Set
This algorithm runs in a divide, conquer and merge
strategy. So in the first step each party will compute k
number of clusters on their own data set. In the next
stage both parties will compute the distance between
each record and each of the k cluster centers. So, n × k
matrix will be computed by each party. Up to this stage
there will be no privacy issue as all the computations are
done by the parties on their private databases. Then in the
third stage both parties send their distance matrices to the
other party. Along with the matrix the k cluster centers
are also sent to the other party in the randomized form.
Next each party computes the all possible combinations
of cluster centers from the total 2k clusters. So, finally k2
cluster centers will be formed. Now each party will have
information about these k2 clusters and they will compute
the distance between each point and the k2 cluster centers.
So, minimum closest cluster will be chosen for n data
points and finally they will be merged into k-clusters.
This merging is done by choosing a best pair of clusters
Ci and Cj and then clusters are replaced by (Ci U Cj). A
best pair of clusters is one with least error. The error be-
tween the clusters can be computed by using the follow-
ing formula.
Let C1 and C2 be two clusters being considered for a
merge and C.weight denote the number of objects in
cluster C and dist2(C1, C2) is the distance between the
clusters C1 and C2 [6], then
Error (C1 U C2) = C1.weight × C2.weight × dist2(C1, C2)
(1)
3.3. Secure Hierarchical Clustering on Vertically
Partitioned Data Set
The approach discussed in this paper is given in brief in
the above section. The algorithm and privacy of this ap-
proach is analyzed in detail in this section. So, Alice and
Bob first locally compute k-clusters each from their own
part of the data. Next, each party computes the following
n × k matrix. So, Alice computes her corresponding dis-
tance matrix MAlice
MAlice=
a2,1 a2
2
a1
1 a1
2 a1,
an,1 an,2 an,
a2
Copyright © 2013 SciRes. JSEA