Social network contains the interaction between social members, which constitutes the structure and attribute of social network. The interactive relationship of social network contains a lot of personal privacy information. The direct release of social network data will cause the disclosure of privacy information. Aiming at the dynamic characteristics of social network data release, a new dynamic social network data publishing method based on differential privacy was proposed. This method was consistent with differential privacy. It is named DDPA (Dynamic Differential Privacy Algorithm). DDPA algorithm is an improvement of privacy protection algorithm in static social network data publishing. DDPA adds noise which follows Laplace to network edge weights. DDPA identifies the edge weight information that changes as the number of iterations increases, adding the privacy protection budget. Through experiments on real data sets, the results show that the DDPA algorithm satisfies the user’s privacy requirement in social network. DDPA reduces the execution time brought by iterations and reduces the information loss rate of graph structure.
The innovation of the knowledge society has promoted the advent of the era of “Internet +”, such as medical data, big data of intelligent city and large education data, which lead the trend of Internet changes. Social network is a new application mode under the Internet background, and the data dissemination in social network has great research value and application significance. The user’s large number of personal privacy information may be leaked when social network data is analyzed and excavated. Social networks are evolving and changing that named dynamic social networks. The dynamic social network is concerned with the dynamic change caused by the change of time in the interaction between social members. The privacy strategy of the static social network data release usually cannot adapt to the dynamic development of social network efficiently. It has far-reaching theoretical significance and practical value in the field of information security and network space security. Existing privacy protection technologies include anonymous technology, data encryption technology, differential privacy technology, privacy information retrieval technology, and accountability system. The social network privacy protection method mainly studies the static social network data dissemination.
Social network Privacy protection technology mainly includes social network data release privacy protection technology based on clustering, and social network data publishing privacy technology based on the graph modification. Terzi [
If the static social network data release method is applied directly to the background of the dynamic social network, although it can meet the requirements of privacy protection policy, the time overhead will increase, and the information loss of graph structure will be increased. For the dynamic social network data dissemination method, Ying [
Karwa and Chen [
The key of privacy protection problem of dynamic social network data is how to protect the user’s sensitive information effectively under the acceptable time cost, and to ensure the loss rate of weight information is small. The main contributions of this paper are as follows.
1) This paper makes an improvement on the social network differential privacy data publishing algorithm based on MCL (Markov Clustering algorithm) [
2) In this paper, a strict differential privacy preserving model is introduced. This paper designs a DDPA algorithm that satisfies ε―difference privacy. The DDPA algorithm identifies the edge weight information that changes as the number of iterations increases and adds the privacy protection budget that satisfies ε. The algorithm achieves privacy protection by injecting noise from the Laplace distribution into the weight of the nodes where the nodes are clustered;
3) This paper experiments on the real social network data set. Comparing with the direct application of MDPA algorithm, the DDPA algorithm satisfies the user’s privacy requirement in the social network, reduces the execution time and the loss rate of weight information.
Definition 1. Dynamic social network
Defining a dynamic social network G : G = ( V I , E I ) , E I = { ( x , y ) | x , y ∈ V I } , VI represents a collection of users in the social network at the time of iterations, and EI represents a collection of the edges of the interaction between users in the social network at the time of iterations, G = { G 1 , G 2 , G 3 , ⋯ , G I } represents the collection of social network graphs at I = 1 , 2 , ⋯ , N . G = { G 1 ′ , G 2 ′ , G 3 ′ , ⋯ , G I ′ } represents the collection of social network graphs which has added privacy protection at I = 1 , 2 , ⋯ , N .
The dynamic social Network graph shows in
Definition 2. The edge weight information of ternary group
Defining a ternary group T = (i, j, x), i, j represents the node number in a social
network diagram, x represents the weight value of the edge. x is 0 when there is no connection between nodes. T = (1, 2, 5) indicates that there is a side between node 1 and node 2, with a weighting value of 5.
Definition 3. Sensitivity: Δ q is the sensitivity of the query function, which is defined as follows:
Δ q = max D 1 , D 2 | q ( D 1 ) − q ( D 2 ) | (1)
Data sets D1 and D2 differ by only one element. In this paper, we suppose two social network data sets GI1 and GI2. There is only one different element between data sets GI1 and GI2. Global sensitivity is set to the maximum difference weight that exists on the difference edge Δf = Wmax.
Definition 4. ε―Weight vector
The social network graph GI1 is initialized, and then the Markov clustering is carried out. The clustering result set V of node cluster is obtained, and then the weight information of each cluster is recorded. According to the order of clustering set, the weights are composed of ternary group T = { T 1 , T 2 , ⋯ , T n } .
T 1 = ( i 1 , j 1 , x 1 ) T 2 = ( i 2 , j 2 , x 2 ) ⋮ T n = ( i n , j n , x n ) (2)
X = ( x 1 , x 2 , ⋯ , x n ) , according to the query sensitivity Δf and privacy budget parameters ε, constructing a noise vector with a Laplace distribution of length d X. P i = X + L a p ( Δ f / ε ) X , P i is a weight vector satisfying ε―differential privacy.
Definition 5. Weight information loss rate of graph
There are G = ( V , E ) and G ′ = ( V ′ , E ′ ) . G ′ is added the privacy protection. The loss rate of weight information due to the change of weight is:
W I L ( G , G ′ ) = ∑ ( i , j ) ∈ E | W ( i , j ) − W ′ ( i , j ) | W ( G ) (3)
W ′ ( i , j ) is the value of weight which has added the privacy protection. W(G) is the sum of all edge weights of network graphs.
Applying static social network data privacy publishing algorithm directly to dynamic social network can cause high execution time and large information loss rate of graph structure. This paper makes an improvement on differential privacy network data publishing based on MCL, and designs a dynamic social network data publishing algorithm DDPA which satisfies the ε―difference privacy.
In order to introduce the algorithm flow of DDPA, the MDPA algorithm is decomposed into two parts that include algorithm 1 and algorithm 2.
Algorithm 1: Input the initial social network graph G, expansion parameter e, inflation parameter p, outputs the ε―weight vector of the initial graph G.
Algorithm 2: Output the ε―weight vector of the initial graph G, privacy budget parameters ε, output the privacy preserving graph G'.
The distribution of social network data has dynamic characteristics, and the graph structure is updated iteratively. DDPA algorithm is an improvement of privacy protection algorithm in static social network data publishing. MDPA algorithm adds noise to the whole network graph, but DDPA algorithm adds noise to the changed network edge weights. DDPA algorithm identifies the edge weight information that changes as the number of iterations increases, and adds the privacy protection budget that satisfies ε. Therefore, DDPA algorithm greatly reduces the execution cost of the algorithm and reduces the loss rate of weight information.
The algorithm steps are described as follows:
Input: Social Network graph GI in the Ith Iteration, Social Network graph G I ′ which has protected in the Ith Iteration, Social Network graph GI+1 in the I+1th Iteration, privacy budget parameter ε, expansion parameter e, Inflation parameter p;
Output: Social Network graph G I + 1 ′ which has protected in the I+1th Iteration
Step 1 Execute algorithm 1, traverse GI, build the weight information ternary group TI (i, j, x) and Vector XI
Step 2 Execute algorithm 2, create a social network graph of privacy protection G I ′ , the weight information ternary group T ′ I and Vector X ′ I which belong to G I ′
Step 3 Execute algorithm 1, traverse GI+1, build weights information ternary group TI+1 (i, j, x) and Vector XI+1
Step 4 Compare TI and TI+1, recognize ternary group Tc which belongs to modified edges, generate the weight vector Xc corresponding to Tc
Step 5 Compare T ′ I and TI+1, recognize ternary group Ta which belongs to add edges, generate the weight vector Xa corresponding to Ta
Step 6 Taking Si as sampling frequency, make Xc to random sampling. Generating Laplace noise Nc that satisfies differential privacy
Step 7 Taking Si as sampling frequency, make Xa to random sampling. Generating Laplace noise Na that satisfies differential privacy
Step 8 Using Xc instead of the changed edge information in the TI’s weight vector X ′ I , add the edge information increment to X ′ I , so X ′ I = X c
Step 9 According to the query sensitivity Δf and the privacy budget parameter ε, constructing a vector of Laplace distribution with length d: X
Step 10 Generating a vector G I + 1 ′ that satisfies differential privacy: DDPA ( G I + 1 ′ ) = P i = X ′ I + L a p ( Δ f / ε ) X I ′
Step 11 Distribute social Network graph G I + 1 ′ , which has protected in the I+1th Iteration
The DDPA algorithm of dynamic social network data release is the improvement of the social network differential privacy data publishing method based on the Markov clustering algorithm in the static social network. The MDPA algorithm has proved that it satisfies ε―difference privacy. This paper only needs to prove that after recognizing the change of the edge weight information, the ε―Weight vector DDPA (GI) satisfies the differential privacy.
According to the definition of differential privacy, we suppose two dynamic social network data sets GI1 and GI2. There is only one different element between data sets GI1 and GI2. Given a privacy algorithm DDPA, Range (DDPA) is the range of DDPA. If any outputs of the DDPA algorithm on data sets GI1 and GI2 satisfy the following inequality, we can say that the DDPA algorithm satisfies ε―differential privacy.
Pr [ DDPA ( G I 1 ) ∈ P i ] ≤ e ε Pr [ DDPA ( G I 2 ) ∈ P i ] (4)
Proof: Set p i ∈ P i , Pi is the same as the Xi dimension. From the conditional probability,
Pr [ D D P A ( G I 1 ) = p ] / Pr [ D D P A ( G I 2 ) = p ] = ∏ i = 1 X Pr [ D D P A ( G I 1 ) i = p i | p 1 , ⋯ , p i − 1 ] / Pr [ D D P A ( G I 2 ) i = p i | p 1 , ⋯ , p i − 1 ] ≤ ∏ i = 1 X exp { | D D P A ( G I 1 ) i − D D P A ( G I 2 ) i | / σ } = exp { ‖ D D P A ( G I 1 ) i − D D P A ( G I 2 ) i ‖ I / σ } = exp { X ( G I 1 ) + L a p ( Δ f / ε ) − X ( G I 2 ) − L a p ( Δ f / ε ) / ( W max / ε ) }
= exp { X ( G I 1 ) − X ( G I 2 ) / ( W max / ε ) } ∵ ( X ( G I 1 ) − X ( G I 2 ) ) ≤ W max ∴ exp { X ( G I 1 ) − X ( G I 2 ) / ( W max / ε ) } ≤ exp { W max / ( W max / ε ) } = exp { ε } = e ε ⇒ ( Pr [ D D P A ( G I 1 ) = p i ] / Pr [ D D P A ( G I 2 ) = p i ] ) ≤ e ε
∵ p i ∈ P i , ∴ Pr [ D D P A ( G I 1 ) ∈ P i ] / Pr [ D D P A ( G I 2 ) ∈ P i ] ≤ e ε
Then Pr [ D D P A ( G I 1 ) ∈ P i ] ≤ e ε Pr [ D D P A ( G I 2 ) ∈ P i ]
Experimental environment is: Intel(R) Core(TM) i5-4590 CPU @ 3.30 GHz 4.00 GB of Memory. The operating system is Microsoft Windows 7 ultimate. The programming languages are C++ and Matlab. The experimental data is Lesmis which is a weighted social network graph [
The experiment of this paper contains three parts. The first part of the experiment tests the execution time of the DDPA algorithm. The second part of the experiment tests the graph weight information loss rate of the DDPA algorithm. The third part of the experiment is to compare the DDPA algorithm and the MDPA algorithm in the execution time and the weight information loss rate. The result of the experiment is the average result of five times.
The execution time test result sets for the DDPA algorithm are shown in
The experiment tells us the execution time is changing with the change of ε and p. The values of ε are 0.05, 0.1, 1 and 10. At the same iteration time, the increase of ε has less effect on execution time. As the number of iterations increases, the difference edge weight information that needs to be identified during each iteration is reduced. When the ε is invariant, the execution time of the DDPA algorithm is reduced correspondingly.
The test results for the weighted information loss rate of graph in the DDPA algorithm are shown in
The experiment tells us the weighted information loss rate of graph in the DDPA algorithm is changing with the change of ε and p. The values of ε are 0.05, 0.1, 1 and 10. From the experimental results we can see that the weight information loss rate of the graph structure decreases with the increase of ε at the same iteration time. When the value of the privacy budget parameter ε is unchanged, the weight information loss rate of the graph structure increases correspondingly with the increase of the number of iterations. The experimental results show that with the increase of ε, the Laplace noise decreases correspondingly. The value of weights becomes closer to the real value, and then the loss rate of weight information becomes smaller.
This experiment is a comparison between the MDPA algorithm and the DDPA algorithm in the execution time and the weight information loss rate. The test results are shown in
In the experiment, the values of ε are 0.05, 0.1, 1 and 10. The experimental results in
Aiming at the improvement of privacy protection algorithm in static social network data publishing, a dynamic social network data publishing algorithm based on differential privacy is designed. This paper recognizes the edge weight information which changes with the increase of the number of iterations, adds the privacy protection budget satisfying ε, reduces the time cost of the algorithm, and guarantees the reduction of the loss rate of weight information. The limitation of this paper is that we only consider the increase or decrease of edge and the change of the edge weight. The change of the node makes the privacy protection budget more complicated. The future work is to deeply study the situation
of the change of node. The method of this paper will enhance the degree of privacy protection and reduce the loss rate of weight information under the condition of satisfying the privacy budget.
This work has been supported by The Ministry of Education’s Research program (2017A20004) and National Science and Technology Support Project (2013BAK07B04).
Liu, Z.P., Dong, Y.W., Zhao, X. and Zhang, B. (2017) A Dynamic Social Network Data Publishing Algorithm Based on Differential Privacy. Journal of Information Security, 8, 328-338. https://doi.org/10.4236/jis.2017.84021