Security insurance is a paramount cloud services issue in the most recent decade. Therefore, Mapreduce which is a programming framework for preparing and creating huge data collections should be optimized and securely implemented. But, conventional operations on ciphertexts were not relevant. So there is a foremost need to enable particular sorts of calculations to be done on encrypted data and additionally optimize data processing at the Map stage. Thereby schemes like (DGHV) and (Gen 10) are presented to address data privacy issue. However private encryption key (DGHV) or key’s parameters (Gen 10) are sent to untrusted cloud server which compromise the information security insurance. Therefore, in this paper we propose an optimized homomorphic scheme (Op_FHE_SHCR) which speed up ciphertext (R c) retrieval and addresses metadata dynamics and authentication through our secure Anonymiser agent. Additionally for the efficiency of our proposed scheme regarding computation cost and security investigation, we utilize a scalar homomorphic approach instead of applying a blinding probabilistic and polynomial-time calculation which is computationally expensive. Doing as such, we apply an optimized ternary search tries (TST) algorithm in our metadata repository which utilizes Merkle hash tree structure to manage metadata authentication and dynamics.
The rapid development in outsourcing data processing and storage by distributed computing framework, and in addition to complex and huge data collection mining have extended the availability of useful data to various organizations of modern society in the exponential way. But, data privacy insurance is a principal issue in huge dataset management on cloud environment, as the dataset proprietor has not any more physical control of his dataset as per the Cloud Security Alliance (CSA) [
alternative to address data privacy assurance issue which is considered by information security community as a paramount topic. Thereby, schemes like DGHV [
Map Reduce in [
¨ Homomorphism
A homomorphism between two algebras, A and B, over a field (or ring) K, is a map F: A → B such that for all k in K and x, y in A;
・
・
・
If F is bijective then F is said to be an isomorphism between A and B.
¨ Homomorphic Encryption
As an ever increasing number of data is outsourced into distributed storage, frequently unencrypted, considerable trust is required in the cloud providers. The CSA records information breach as the top issue to cloud security [
Homomorphic encryptions allow complex mathematical operations to be performed on encrypted data without compromising the encryption. This homomorphic encryption is expected to play an important part in cloud computing, allowing companies to process and store encrypted data in a public cloud and take advantage of the cloud provider’s analytic services. It is first designed in 1978 by Rivest et al. [
Security insurance issues on Mapreduce framework have started to draw escalated consideration. In this manner data confidentiality protection issues have been widely examined and productive progress have been accomplished by the security community practitioners. We quickly audit few existing models about security protection on Mapreduce framework.
The authors in [
Encryption:
Decryption:
But, to apply homomorphic encryption in this scheme, authors do few modifications on ciphertexts to allow the Reduce function to find the identical keys and afterward group them like following:
For a ciphertext
Then, the authors compare (
As discussed in [
Our contribution:
Note that, the improvement on this paper is mainly on the optimization of the input file decomposition (map phase) and ciphertext retrieval algorithm (reduce phase) by addressing the metadata dynamics and authentication path through a logical Merkle tree repository structure (optimized space-time cost).
As clearly proved by the research community; the homomorphic encryption can carry some operations over encrypted data effectively, but it is very expensive scheme in terms of computation and communication costs [
Thus, our optimized algorithm (Op_FHE_SHCR) through successful experiments (see section below) performs 3 times faster the original FHE_SHCR scheme [
As mentioned in the previous section, our proposed scheme further addresses metadata authentication and dynamics issue for strong data privacy protection. Therefore, we introduce a logical agent: Anonymiser in the master control pro-
gram. The Anonymiser has three pieces namely: Decomposition table, Query Processing, and Metadata Repository. Their functions can be briefly described as following:
¨ Decomposition table: It is responsible for defining the exact set of attributes (A ) for particular input files in the optimal number.
¨ The Query Processing: It filters the candidate map workers queries request generated by the master program to produce anonymous query-based request on data location for processing.
¨ The Metadata Repository: It keeps data decomposition done by the decomposition table and forwards them to the Query Processing unit to generate new anonymous query request. For the efficiency of the proposed scheme, we use Merkle hash tree structure to deal with metadata authentication and dynamics [
All input files are transformed into a set of symbols (A) as
File uploading: Suppose that a data owner wants to process a file F identified by
file F consists of a keyword set W. Then, the owner randomly chooses a symmetric key
To enhance the searching efficiency, a symbol-based tree is utilized to build an index stored in private cloud (metadata repository). More precisely, divide the output of one-way function f into l parts and predefine a set
Update: Assume the data owner wants to outsource a file F identified by
i) Step 1: Public cloud parses
ii) Step 2: Public cloud starts with the root node of tree: it scans all the children of the root node and checks whether there exists some child node 1 such that the symbol contained in node 1 equals
iii) Step 3: Assume that current node
Search: Assuming that the legitimate user wants to search outsourced files with keyword w and privileges
So to address Merkle tree traversal problem, our scheme uses some tools from the efficient algorithm in [
The TST is space efficient, but increases with the number of strings (N). Therefore the traversal problem is how to calculate efficiently the authentication path for all leaves one after another starting with the first leaf up to the last leaf, for minimum amount of space-time cost. Hence, it implies to analyze an optimal distribution of singleton attribute (
from the scheme in [
Let
Therefore the probability for the (ith) element to be a singleton in the universal decomposition table by selecting one of the (n) choices (entries) is
Let the variable
Let
We aim to find the smallest number of singleton to populate efficiently the Merkle tree in the metadata repository.
It implies to minimize
Subject to
Therefore we get the optimized number of singleton by rewriting the above distribution as a constrained optimization problem [
KKT condition: Primal variable:
Let us considering an optimization problem of forms Minimize f(x) Subject to
With
Using the Lagrange multiplier and the duality theorem, the solution of the problem (P) is determined as following:
Then
Then,
Case1:
Case 2:
Finally the optimal number of singleton quasi-identifiers for a decomposition table of (n) entries with maximum total number of distinct values (N) is
Regardless of the advances in remote sensor network (WSN) to controls systems into cloud, there are still enormous challenges in term of security insurance over
outsourced data processing and storage [
Pseudo code:
This algorithm initializes the selected feature subset (splitting the input file into subsets) denoted by
another feature to generate a new candidate. That is, the new feature in the chosen candidate will be added to the selected feature subset. Thus, this algorithm iteratively adds one feature (or the fixed number of features if the floating strategy has been used) to increase the selected feature subset until the threshold is met. It should be pointed out that the main difference between the proposed algorithm and the existing ones in the literature is that our algorithm produces high correlated data subsets based on the hashing index value. Therefore the ciphertext retrieval process at the reduce stage will be more efficient in terms of speed.
The design of our OP_FHE_SCHR cryptosystem is done using the HElib- master-2015.03 library in Dev C++ IDE. We utilize the WDBC Test training dataset for cancer management project. Our security algorithm is implemented in four steps using Gentry cryptosystem [
The efficient analysis of the candidate solution is proved by its experiments results that are compared with the existing blinding fully homomorphic FHE_DFI_LM algorithm, previous FHE_SCHR, and our new optimized Op_FHE_SCHR algorithm. Recall that, the improvement on this paper is mainly on the optimization of ciphertext retrieval time and metadata dynamics and authentication path in the logical Merkle tree repository (optimized space-time cost).
ALGORITHM | AVERAGE PERFORMANCE | |||
---|---|---|---|---|
Setup time (ms) | Encryption time (ms) | Ciphertext Retrieval time (ms) | Decryption time (ms) | |
FHE_SCHR | 11,684 | 37,419 | 37,419 | 7994 |
FHE_DFI_LM | 13,078 | 77,507 | 77,507 | 41,085 |
Op_FHE_SCHR | 5932 | 37,120 | 12,476 | 7990 |
Based on the result of effective experiments directed, it is unmistakably certain that the proposed optimized scheme Op_FHE_SCHR speedups the setup (input files decomposition) and ciphertext retrieval time without compromising the cryptosystem. Thereby, the graph 5 shows that the proposed alternative is more efficient regarding the ciphertext retrieval and computation cost reduction. Note that homomorphic cryptosystem is extremely expensive [
Two security requirements are to be achieved, that is data confidentiality and integrity. Therefore, our security scheme is based on a hybrid encryption design. Since the file is encrypted with a hybrid encryption as
time algorithm has the probability at least
Recall that this paper is based on [
i) Pre-image resistance: that is, given a hash value (h), it is difficult to find a message m such that h = hash (m).
ii) Collision resistance: that is, finding two messages m1 ≠ m2 such that hash (m1) = hash (m2) is difficult.
This data structure is a complete binary tree with an n-bit hash value associated with each node. Each internal node value is the result of a hash of the node values of its children. Merkle trees are designed so that a leaf value h (
To summarize the security analysis, we can say by implementing a secure front end database management agent (Anonymiser) on top of FHE_SHCR security mechanism [
In this paper, the requirements are to optimize the outsourced data processing at the map stage and prevent intermediate data disclosure at the reduce phase in order to reinforce data privacy on Mapreduce framework. Therefore, we implement a secure Front End Database Management agent: the Anonymiser with its three components (Decomposition table, Query Processing, and Metadata Repository.) to enhance the data security mechanism of our proposed solution. The cryptosystem tool is a scalar homomorphic encryption that performs some sorts of calculations over encrypted data in more secure and optimized design. The optimized cryptosystem Op_FHE_SCHR is by the experiments results an efficient candidate for the communication and computation costs reduction. Practically, it takes as input files an optimized decomposition table (for map workers), and improves the speed and accuracy of ciphertext retrieval process (for reduce workers) on Mape Reduce environment. Furthermore, we address the metadata dynamics and time space cost constraints for the traversal of Merkle tree structure in our metadata repository by applying an optimized ternary search tries (TST) algorithm.
This work has been supported by MoE-CMCC (Ministry of Education of China- China Mobile Communications Corporation) Joint Science Fund under grant MCM20130661.
Martin, K., Wang, W.Y. and Agyemang, B. (2017) Optimized Homomorphic Scheme on Map Reduce for Data Privacy Preserving. Journal of Information Security, 8, 257-273. https://doi.org/10.4236/jis.2017.83017