Physical document verification is a necessary task in the process of reviewing applications for a variety of services, such as loans, insurance, and mortgages. This process consumes a large amount of time, money, and human resources, which leads to limited business throughput. Furthermore, physical document verification poses a critical risk to clients’ personal information, as they are required to provide sensitive details and documents to verify their information. In this paper, we present a systematic approach to address shortcomings in the current state of the processes used for physical document verification. Our solution leverages a semi-trusted party data source ( i.e. a governmental agency) and cryptographic protocols to provide a secure digital service. We make use of homomorphic encryption and secure multi-party computation to develop a series of protocols for private integer comparison and (non-) membership testing. Secure boolean evaluation and secure result aggregation schemes are proposed to combine the results of the evaluation of multiple predicates and produce the final outcome of the verification process. We also discuss possible improvements and other applications of the proposed secure system of protocols. Our framework not only provides a cost-efficient and secure solution for document verification, but also creates space for a new service.
Recent advances in technology have led to the introduction of many digital and automated services, such as e-shopping, e-learning, and e-banking. These digi- talised services not only reduce the cost of operation, but also increases through- put for businesses. However, several tasks in these services continue to involve a considerable amount of human effort. Physical document verification is a necessary task in the process of reviewing applications for many services, such as loans, insurances, and mortgages. This process consumes a large amount of time, money, and human resources. Consider the example of a loan or an insurance application. The applicants are usually required to provide numerous documents to certify their relevant personal information, such as birth certificate, statement of monthly income, marriage certificate, medical records, and so on. At the same time, the loan/insurance provider requires a considerable amount of human resource to verify and store these documents. This process can take several weeks to complete, and serves to limit business throughput.
Moreover, the process of physical document verification incurs a critical privacy risk for applicants. They provide to a third party (i.e. the service provider) many sensitive documents, such as birth certificates, IDs, health records, and so on. All these documents are stored in the provider’s database. If the client applies for multiple schemes or subscriptions, multiple copies of his/her per- sonal data are stored in different places. Since data can be leaked from the server, storing personal information in multiple third-party databases is not recommen- ded. One source of such a leak is employees who do not follow the company’s privacy policies, and may, intentionally or unintentionally, reveal sensitive client information. Even when the provider claims to enforce strict policies pertaining to privacy, there is still a chance that the database systems are vulnerable to malicious external attacks.
In this paper, we propose a systematic approach to address the abovementioned shortcomings of the current state of the process of physical document verifi- cation. We assume that there is a trusted data source that stores the certified personal information of clients. We also assume that a list of requirements (maybe involving the divulgence of private information) needs to be fulfilled by the applicant to qualify for a given scheme or subscription. We present a series of protocols that allow the verifier and the data keeper to communicate with each other and securely verify the applicant’s information according to the requirements proposed by the verifier. The main contributions of this paper are as follows:
1) The proposed system digitalizes the process of document verification and hence enhances business throughput.
2) The system protects user confidentiality from both the verifier and the data keeper. The details of the requirements proposed by the verifier also remain hidden from the data keeper.
3) The proposed approach creates space for new services for information data storage and verification.
The remainder of the paper is organised as follows: In the next section, we review related work in the literature. Section 3 contains our problem formulation as well as the scenario we consider. Section 4 contains a discussion of our security model and assumptions as well as the underlying cryptographic techniques we leverage (i.e. Paillier’s encryption scheme). The proposed solution to the problem of secure personal information verification is described in Section 5, which systematically discusses four stages of the solution. Section 6 presents experimental evaluations of the proposed sub-protocols. The final section discusses future work and our conclusions.
The scenario we consider shares characteristics with the problem of zero- knowledge proof. Zero-knowledge proof systems, introduced by Goldwasser et al. [
Another approach to consider is private set intersection [
While sharing similar purposes as the above approaches, our system considers a different setting in the context of zero-knowledge proof systems. The commu- nication and verification processes are conducted by a verifier and a semi-honest data keeper, rather than by a verifier and a client. Our approach leverages the computation model of secure multi-party computation introduced by C. Yao [
Definitions. Our proposed system involves three general parties―the client, the verifier, and the data keeper―as illustrated in
・ The client. The client wishes to privately prove that his/her personal data satisfy the predicates predefined by the verifier.
・ The verifier. The verifier (we call the verifier Bob) provides a series of pre- dicates that need to be satisfied by the client.
ID | Name | Age | Sex | Income | Nationality | Marital Status |
---|---|---|---|---|---|---|
1 | Julia | 29 | F | 30,000 | Singaporean | Married |
2 | Deny | 32 | M | 35,000 | Singaporean | Single |
3 | Christina | 38 | F | 80,000 | Myanmar | Single |
4 | Alwen | 41 | M | 120,000 | Indonesian | Married |
5 | Dino | 37 | M | 90,000 | Malaysian | Divorced |
・ The data keeper. The data keeper (whom we call Alice) stores the personal data of the client and provides a security guarantee for the data storage.
The database stored by the data keeper consists of n records. Each record describes a client by m attributes.
A personal information verification scheme is a Boolean function on a data record. The Boolean function is informally described by single and complex predicates. We assume that the single predicates are equality, inequality, mem- bership, and non-membership.
・ An equality predicate examines whether a variable x is equal to a certain value a:
・ An inequality predicate inputs a variable x and a certain value a, and outputs 1 when
・ The membership and non-membership predicates check whether variable x belongs (or does not belong) to a set A of elements:
A complex statement contains multiple single predicates and a set of logical expressions
Workflow. To illustrate, consider the following example: A client first outsources his/her data to a data keeper called Alice. Alice can be a govern- mental agency. The client and Alice are responsible for ensuring the correctness of the information. The client can pay a small fee to Alice for keeping track of his/her data. The verifier Bob provides a subscription scheme. In order to subscribe to the scheme, the client needs to satisfy a number of statements for personal information, including age, income, nationality, health condition, etc. He/She wants to prove that he/she qualifies for the scheme, but does not want to reveal exact information. At the same time, he/she also wishes to hide the fact that he/she is applying the certain scheme through others (such as Alice). He/She should anonymously authenticate Bob to communicate with Alice. Bob interacts with Alice by our proposed approach. Finally, Bob should be able to decide whether the client qualifies for the given scheme.
In this paper, the privacy/security of the proposed protocols is measured by the amount of information disclosed during execution. We adopt the security definitions and proof techniques from the literature on secure multi-party computation to analyse. The secure multi-party computation problem involves multiple parties collaboratively performing various types of computation with- out compromising the privacy of data. In the mid 1980s, C. Yao [
There are two common adversarial models under secure multi-party compu- tation: semi-honest and malicious. In the malicious model, the adversary has the ability to arbitrarily deviate from the protocol specifications. On the other hand, in the semi-honest model, an attacker (i.e. one of the participating parties) is expected to follow the prescribed steps of the protocol. However, the attacker is subsequently free to compute additional information based on his or her private input, output and messages received during the execution of the secure protocol. Although the assumptions of the semi-honest adversarial model are weaker than those of the malicious model, we insist that this assumption is realistic under the problem settings. We assume that the data keepers are trusted (as they are governmental agencies) to ensure the confidentiality of sensitive client data. It is difficult to imagine them colluding with other companies to damage their own reputation. Moreover, it is often non-trivial for one party to maliciously deviate from a particular protocol which may be hidden in a complex process.
In short, we assume that the verifier and the data keeper are semi-honest. They will correctly follow the protocol specifications. However, at the same time, they are also curious about the applicants’ information. In general, secure personal information as described in Section 5 should meet the following privacy requirements:
・ Client-to-verifier privacy. The verifier should not be able to gain any details concerning the client’s personal data stored in the data keeper’s database, except for those he can learn from the result (i.e. the client qualifies or not).
・ Client-to-data-keeper privacy. At any point during protocol execution, the identity of the applicant should not be revealed to the data keeper.
・ End user’s privacy. The verifier should not be able to obtain any information relating to other clients stored in the data keeper’s database.
・ Verifier-to-data-keeper privacy. The details of the predicates should not be leaked to the data keeper. This requirement is particularly applicable to private services where the selection criteria may be private to the provider.
An additive homomorphic encryption scheme is a cryptosystem that allows arithmetic (i.e. addition) operations to be performed on the ciphertext without decryption or knowing the actual values. Efficient additive homomorphic cryptosystems have been proposed, such as the Pallier cryptosystem [
The Paillier cryptosystem consists of three algorithms:
1)
2)
3)
The security of the Paillier encryption scheme relies on the computational hardness assumption of a novel mathematical problem called composite residuosity. The decision version of this problem class assumes that no polynomial-time algorithm can distinguish the N-th residues modulo
The Paillier cryptosystem is additive homomorphic encryption. If we consider two operators ´ and + in the ciphertext and the plaintext domains, respectively,
・ Additive Homomorphism:
・ Homomorphic Multiplication:
・ Semantic Security: Informally, a semantically secure [
The above computation is performed modulo
Our proposed approach to the secure information verification problem consists of four stages:
1) Setup―During this phase, the client goes through an anonymous authen- tication process so that the verifier is authenticated to communicate with the data keepers for the verification stage. In addition, the data keeper and the verifier generate an encryption key pair and exchange the public key of the homomorphic cryptosystem. These public keypairs are utilized for secure com- munication and computation at the later stages.
2) Single Predicate―In this stage, the data consumer evaluates a predicate for each entity in the dataset of the data keeper. The output of this stage is the encryption of either 1 or 0, depending on whether the entity satisfies the pre- dicate.
3) Secure Complex Predicate Evaluation―Based on the results of the previous stage, the verifier collaborates with the data keeper to compute the result of the complex logical combination of Boolean predicates. Again, the output of this stage is the encryption of either 1 or 0 depending on whether the entity satisfies the predicate.
4) Aggregation of Output Data―At this stage, the final result is aggregated, decrypted and shown to the verifier. Since the data keeper computes the de- cryption, we propose a secure protocol to generate the outcome so that the data keeper cannot obtain any information concerning the final result.
In the setup phase, the client is first required to complete anonymous authen- tication with the data keeper Alice, who then allows the verifier Bob to initiate the secure information verification process on the records of Alice’s database. When the clients agree to their personal information being stored in Alice’s database, she issues to each client a credential to be used for authentication. Each time a client subsequently requests access to Alice’s database, he/she uses her credentials for verification with Alice, who begins communication with Bob for the information verification process.
Traditional password-based authentication systems expose the identity of the client to the data keeper Alice. Hence, they violate the client-to-data-keeper privacy requirement. To satisfy this, it is desirable to have an authentication scheme that promises unlinkability, i.e. the server should not be able to link user requests such that access to the same user cannot be recognised as such.
As the anonymous authentication process is not our main contribution here, we only briefly review possible approaches to satisfy this requirement. The most feasible solution is anonymous credentials introduced by D. Chaum [
Applied to our problem setting, the client first generates a non-interactive zero-knowledge proof (i.e. applying the Fiat--Shamir transform) of his/her credentials with Alice. He/She transfers the proof to Bob, who submits the proof to Alice. Finally, Alice authenticates Bob to communicate and verify the client’s information.
Following the authentication process, Alice and Bob generate two Paillier key pairs using the
In the single predicate evaluation stage, for each data record and each attribute that needs to be verified, the verifier Bob and the data keeper Alice together perform one of the following protocols: equality predicate evaluation, inequality predicate evaluation and (non-) membership predicate evaluation. The output of each protocol is an encrypted bit maintained by Bob. The resulting bit is encrypted under the data keeper’s public key so that Bob cannot obtain any information relating to the other entities in the database. We now describe the three protocols to securely evaluate the results of these predicates.
A secure equality predicate evaluation tests whether two private inputs x and y are equal:
The computation in Steps 1 - 2 transforms the problem into a secure equal-to-zero protocol. In this protocol, Alice holds an encrypted message with value a. The message is encrypted under Bob’s key; hence, neither Alice nor Bob has information concerning the value a. The remaining part of the protocol involves compare a with 0. In the last step, Bob is required to compute an AND operator on the ciphertext space. In binary setting, the AND operator is exactly a multiplication scheme. We describe a secure multiplication scheme as in Protocol 4. The protocol allows Bob to compute the product of two ciphertexts where he does not know the decryption key.
The computations on lines 2 and 5 of Protocol 2 are performed by using the homomorphic property of Paillier encryption. During the protocol, Bob only works on encrypted data while the server receives two random numbers. Hence, no information regarding x and y is obtained by Bob and S. The correctness of the protocol is trivial, as
Analysis. We now analyse the correctness and security of the equality predicate evaluation protocol (Protocol 1). Due to the transformation in Steps 1 - 2, we only need to examine the remaining parts, where the two parties together compare the encrypted value a with 0.
We note that in Step 7, Alice reserves bit
The security of the two parties follows the semantic security properties of the employed encryption scheme―the Paillier cryptosystem. Alice only obtains the encryption version of y. On the other hand, Bob receives a randomized value
The inequality predicate considers two parties that pose two private integral values x and y and wish to evaluate the predicate
We propose a variant of Blake’s protocol [
Blake’s protocol allows us to obliviously transfer one over two secrets depending on the result of the secure comparison. The protocol considers the scenario where there are two parties holding two private inputs x and y. The second party holds two secrets
In Step 3.b, Bob is required to compute the XOR of two encrypted bits
Analysis. We first show that the protocol correctly computes the desired functionality. The flag vector
Let k be the first position where
Since there is a negligible minority of elements of
We now prove the security of the protocol. Due to the universal security of the secure equality evaluation protocol, we only need to consider the first part (i.e. Steps 1 - 3). Privacy for Alice trivially holds because of the semantic security properties of the employed encryption scheme―the Paillier cryptosystem. Bob only receives from Alice a list of encryption messages, and obtains no more information about Alice’s private input.
Bob’s privacy against the semi-honest party Alice is proven by constructing a simulator
A membership predicate allows the verifier to examine whether an attribute of the client falls into certain categories. A simple example is the case where the verifier wishes to know if an applicant works in the education industry (e.g. teacher, student, librarian, school counsellor, etc.). A non-membership predicate is the complement of the membership query, and tests whether a particular value is excluded from a set.
The membership predicate evaluation protocol is presented in Protocol 4. The non-membership predicate can be easily derived from Protocol 4 by applying the NOT operator discussed in Section 5.4.
In the protocol, Alice is required to evaluate the encrypted polynomial
Analysis. We first analyse the correctness of the protocol. If
The security the protocol can be proven with two simulators that generate the views of the two parties, Alice and Bob. For Alice, a simulator that generates and sends n random encrypted values is a valid simulator. Due to semantic security, she cannot distinguish the simulator from a real-world scenario. Similarly for Bob, a random number
At this stage, Bob holds the encrypted result of the evaluation for each data record, with each attribute in a complex predicate that needs to be verified. This sub-section discusses three basic primitives that operate on the encrypted inputs at this stage. With these primitives, Bob has the capability to compute the results of the encryption of the desired bit to evaluate each data record. The output of this stage is an encrypted bit for each data record. This bit indicates whether the given record satisfies the complex statement.
The inputs of the three primitives are either one encrypted bit (NOT opera- tion) or two encrypted bits (AND and OR operations). They are described as follows:
1)
2)
3)
As the input of this stage, for each entity in Alice’s database, Bob holds an encrypted bit that determines whether the data record qualifies the complex statement. In order to ensure there is exactly one qualified data record in case the application is successful, we introduce one special attribute to the final complex predicate. The attribute is the secret identification of the client in the database.
We assume that when the client registers his/her data with Alice the data keeper, Alice generates a secret random number
1) The client encrypts the random secret under Bob’s key, obtain
2) The client anonymously sends the encryption of secret value to Alice.
3) Alice and Bob perform secure equality evaluation (starting from step 2) and get the result
4) Bob applies AND operation with
With the additional step, now Bob holds an array of encrypted bits with all 0s and at most one bit 1. Bob uses a homomorphism to compute the encrypted sum of these bits; the result is the encryption of either 1 or 0. He can send it to Alice for decryption and obtain the final result to determine whether the applicant qualifies. However, this may compromise Bob’s privacy, especially when he wants to hide his business progress. In order to maintain his privacy, we introduce one step for the randomization of the decryption process as follows:
1) Bob computes the encrypted sum using a homomorphism to obtain
2) Bob generates a random number r, and computes
3) Alice decrypts c to obtain
4) Bob computes the result
Finally, Bob is able to decide the result of the verification process by bit s.
We first consider the security of the entire system, since all intermediate results revealed to Alice and Bob are either random or semantically secure encryptions of numbers. Furthermore, the outputs of all sub-protocols (only seen by Bob) are always encrypted under Alice’s key. Under the assumptions of the semi-honest model, we claim that the sequential composition of these sub- protocols leaks no details of the client or the predicates proposed by the verifier.
The second issue we consider is the practical implementation of the system. Since the same procedure is applied for all the data entries, the verification results for each data record can be computed in parallel. That means we are able to construct multiple verification threads, each one is corresponding to one data entry. By the batch verification approach, we can improve the running time of the whole process by a factor of n/m, where n is the number of data records and m is the number of threads.
While the same procedure is applied for each data record, the data keeper is not able to know who is the applicant. In practice, there are some cases that the data keeper (e.g. a governmental agency) is allowed to know the identity of the applicant, where this rigorous security feature is then not required. The proposed solution can be modified, and inherently improves performance. Specifically, the client can perform a simple authentication rather than an anonymous solution to allow the verifier to communicate with the data keeper. The verification process only needs to be performed on the only one data record identified by the client. Hence, the cost of the proposed solution is reduced by a factor of n where n is the number of data records in the data keeper’s database.
In our proposed solution, an applicant qualifies only if he/she satisfies all criteria specified by a single predicate or complex predicates. Hence, we can define a complex predicate to cover all criteria using the AND operation. We also can extend our protocol to adapt to threshold criteria, where the applicant qualifies only if he/she satisfies more than k criteria. The idea is to compute the sum of each predicates evaluation (in encrypted form) and apply a slightly modified version of Protocol 3 to compare the encrypted value with threshold k.
We implemented our proposed method, and calculated the CPU time required to run our sub-protocols from Section 5. Our experiments were conducted on a Windows 10.0 machine with a 3-GHz processor and 16 GB of RAM. We used the Paillier cryptosystem as the underlying additive homomorphic encryption scheme and implemented the proposed sub-protocols in Java.
We first examined the operation of the secure equality evaluation and the secure inequality evaluation protocols. Two factors affect the performance of these protocols: the Paillier key size and the domain size of the input.
The third single-predicate evaluation building block was the (non-) membership predicate. The run time of the building block depends on three factor: the Paillier key size, the number of elements in the set and the bit size of the inputs, where bit size only affects the final step of Protocol 4, which is the secure equality evaluation protocol.
Size | Prtcl.1 | Prtcl.3 |
---|---|---|
32 | 796 | 1769 |
64 | 1472 | 3542 |
160 | 3277 | 8623 |
Size | Prtcl.1 | Prtcl.3 |
---|---|---|
32 | 4477 | 12,047 |
64 | 9983 | 24,755 |
160 | 22,569 | 57,393 |
Key Size | Secure Negation | Secure AND | Secure OR |
---|---|---|---|
512 | 4 ms | 20 ms | 22 ms |
1024 | 17 ms | 73 ms | 86 ms |
2048 | 81 ms | 517 ms | 558 ms |
run times of the two remaining factors and the performance of the building block.
We had made a similar observation earlier: the cost of the secure membership evaluation protocol when the key size was 1024 bits was roughly six to seven times more efficient than with a length of 2048 bits for the Paillier key. The computational cost of the protocol also increased linearly with the size of the set. Finally, the run time of the three protocols that evaluated the Boolean functions are shown in
In order to verify the feasibility of the whole proposed system, we conducted an experiment on a simulated dataset. We consider a complex statement veri- fication comprising of 10 single predicates linking together by two boolean operations AND, OR. The running time for verifying single data record was 25 seconds, and it took approximately 1 hour to verify one thousand data record in the parallel mode of 10 threads running simultaneously.
In this paper, we proposed a framework for privacy-preserving verification of personal information. We used the secure multi-party computation model and homomorphic encryption to develop a systematic solution to the problem in four stages. We showed that the proposed scheme can protect the clients privacy from both the verifier and the data keeper, and at the same time provides privacy to the former. Different ways to further enhance the performance of the pro- posed method and a scheme extension for threshold verification were discussed. The experimental results highlighted the efficiency and feasibility of our pro- posed scheme under different security settings.
Do, H.G. and Ng, W.K. (2017) Private Personal Information Verification. Journal of Information Security, 8, 223-239. https://doi.org/10.4236/jis.2017.83015