In accordance with the distributive traits of semiprimes’ divisors, the article proposes an approach that can find out the small divisor of a semiprime by parallel computing. The approach incorporates a deterministic search with a probabilistic search, requires less memory and can be implemented on ordinary multicore computers. Experiments show that certain semiprimes of 27 to 46 decimal-bits can be validly factorized with the approach on personal computer in expected time.
A semiprime is an odd composite number N that has exactly two distinct prime divisors, say p and q, such that 3 ≤ p < q . Factorization of the semiprimes has been a difficult problem in mathematics and computer science, especially factorization of a RSA number that is a large semiprime, as introduced and overviewed in articles [
This section lists the preliminaries that include definitions, symbols and lemmas, which are necessary for later sections.
In this whole article, a semiprime N = p q means p and q are both odd prime numbers and 3 ≤ p < q . An odd interval [ a , b ] is a set of consecutive odd numbers that take a as the lower bound and b as the upper bound; for example, [ 3 , 11 ] = { 3 , 5 , 7 , 9 , 11 } . Symbol ⌊ x ⌋ is the floor function, an integer function of real number x that satisfies ⌊ x ⌋ ≤ x < ⌊ x ⌋ + 1 ; symbol GCD means the greatest common divisor; let m > 0 ; then e m b , e m p , e m q and e m 0 are defined by
e m b = N ( m + 1 , 2 m − 1 − 1 ) = 2 m N − 1
e m p = 2 m N − p
e m q = 2 m N − q
e m 0 = e m b − 2 ( ⌊ N + 1 2 ⌋ − 1 )
Symbol A Δ = B 2 , which was defined in [
Lemma 1. (See in [
Lemma 2. (See in [
e m k l 1 < e m k r 1 ≤ e m k l 2 < e m k r 2
where e m k l 1 = e m b − 2 ( l b k 1 − 1 ) , e m k l 2 = e m b − 2 ( l b k 2 − 1 ) , e m k r 1 = e m b − 2 ( l s k 1 − 1 ) and e m k r 2 = e m b − 2 ( l s k 2 − 1 ) .
Particularly, when k 2 = k 1 + 1 it holds
e m k l 1 < e m k r 1 = e m k l 2 < e m k r 2
Lemma 3. (See in [
1) I i ∩ I i + 1 = e m i r ;
2) ∪ i = 1 i = ω I i = [ e m 0 , e m ω r ] ;
3) ∪ i = 1 i = ∞ I i = [ e m 0 , e m b ] ; as illustrated in
Lemma 4. (See in [
Lemma 5. (See in [
Lemma 6. (See in [
⌊ 1 4 N α − β 2 α ⌋ − 1 . Particularly, arbitrary α ≥ 2 yields Δ k ≤ ⌊ N 2 α 4 ⌋ for k ≥ ⌊ N α − 1 3 α ⌋ and Δ k ≥ ⌊ N 2 α 4 ⌋ − 1 for k ≤ ⌊ N α − 1 3 α ⌋ − 1 .
Lemma 7. (See in [
then there are intervals that contain at most ⌊ N 4 4 ⌋ nodes and there are intervals that contain at least ⌊ N 4 4 ⌋ − 1 nodes.
Based on the previous lemmas, theorems and corollaries, one can easily draw the following conclusions.
1) If N = p q is a semiprime, then there is a term e m p that lies in the odd interval I 0 and satisfies p = G C D ( N , e m p ) ;
2) If I 0 is subdivided into a series of subintervals that are defined in Lemma
3, then e m p ∈ I k with k = ⌊ q p ⌋ , and the bigger k is the fewer nodes are
contained in I k . Among all the subintervals, I 1 , I 2 and I 3 dominate half of
I 0 . Lemma 6 shows that, when k ≤ ⌊ N β 3 α ⌋ − 1 there are at least ⌊ 1 4 N α − β 2 α ⌋ − 1 nodes in I k .
These provide a guideline for designing new algorithm for integer factorization, as the following subsections demonstrates.
Now it has gotten to know that finding e m p out means a successful factorization of N = p q . Since e m p hides itself in one of I 1 , I 2 , ⋯ , I ω , it can surely be found out by searching the intervals one by one. Considering by Lemma 7 that there are intervals that contain small number of nodes that can be searched in small time and there are also intervals that contain too many nodes to be searched when N is very big, it is necessary to know which intervals contain small number of nodes and which ones contain large number of nodes and then perform a brute-force search on the small ones and perform other searches on the large ones. Since the brute-force search is a time-consuming process, a Tolerable Number (TN) can be defined to be an upper bound of nodes that are sure to be searched out in a Tolerable Time (TM), which was introduced in FU’s article [
T N = ⌊ N 2 α 4 ⌋ and thus ω 0 = ⌊ N α − 1 3 α ⌋ . Since the number of nodes in I ω is
smaller than that in I ω 0 if ω > ω 0 by Lemma 6, TN is a critical number for the brute-force search. All the intervals I ω 0 + 1 , I ω 0 + 2 , ⋯ can be searched by the brute-force search.
Now it turns to the big intervals I 1 , I 2 , ⋯ , I ω 0 , each of which contains more than TN nodes. It is sure that, applying TN to subdivide each of these big intervals can obtain a series of new small odd intervals and then assigning each of the newly-subdivided small subintervals a process in a parallel computing system can perform the brute-force search in TM, as tested by FU [
By now, setting a TN, calculating an α by T N = ⌊ N 2 α 4 ⌋ plus an ω by ω = ⌊ N α − 1 3 α ⌋ and imagining varying k = ⌊ q p ⌋ result in a subdivision of the
interval I 0 into two kinds of subintervals, the kind of small ones and the kind of big ones, as shown in
Now consider a big odd subinterval I that contains n terms. Suppose the objective odd number o = p s lies at the m-th position. Referring to the analysis in [
Based on the strategy for algorithm design stated in previous section, a parallel algorithm, which is called TNPTN MPI Algorithm, is designed to find an
objective node N o b j that has common divisor p with N = p q . The algorithm assumes that k = ⌊ q / p ⌋ varies from 1 to an upper bound and assigns for each k a process to search N o b j . It requires initial input data N to be the big semiprime, TN to be the number of the maximal steps that a brute-force search performs and a number N r a n d with N r a n d < T N to set the number of random odd integers that are randomly picked in its searched interval with the multi-dimensional random-number generator introduced in [
Numerical experiments were made on a Sugon workstation with Xeon(R) E5-2650 V3 processor of 20 cores and 128GB memory via C++ MPI programming with gmp big number library. Several big semiprimes with 27 to 46 decimal-digits are factorized, as shown in
It is a convention to make a comparison of a new approach to the old ones although, sometime, there is no comparability between two things. Accordingly, this section makes comparisons and then prospects some future work.
It can see from the implementation of algorithms list in previous section that, each of the algorithms 1, 2 and 3 needs memory only for storing a few integers to be taken into the computation. They cost less memory. Since the whole procedure is a parallel one, the time it costs depends on the resources joining the computation. Theoretically, it is at most O ( 2 N 4 ( l o g 2 N ) 3 ) bit-operations
providing that there is ⌊ N 4 2 ⌋ process joining the computation.
Now turn to the old ones. It is known that, ever since John Pollard raised in 1975 his Pollard’s Rho algorithm, which is a probabilistic algorithm and is efficient in factoring small integers, many algorithms of integer factorization have developed. As stated in the introductory section, the GNFS has been regarded the fastest approach to factorize big integers under both sequential and parallel computing and almost all the factorized RSA numbers are factorized by the approach with parallel computing. So here the new approach is merely compared to the Pollard’s Rho approach and the GNFS approach.
Semiprime | bits | Divisors | Time(s) |
---|---|---|---|
N1 = 521900076822691495534066493 | 27 | 15098125637513 | 20134.371596 |
34567209821461 | |||
N2 = 63281217910257742583918406571 | 29 | 125778791843321 | 22487.545370 |
503115167373251 | |||
N3 = 194920496263521028482429080527 | 30 | 289673451203483 | 326720.128666 |
672897345109469 | |||
N4 = 2400000000000001550240000000000042854447 | 40 | 37678804836791 | 236949.403121 |
63696287883753452357619017 | |||
N5 = l4272476927059598804393l594750096l98971949056l | 46 | 2305843009213693951 | 336797.313147 |
618970019642690137449562111 |
First compare with the Pollard’s Rho approach. From the point-view of memory cost, the new approach is like the Pollard’s Rho approach that costs less memory. From the point-view of time consumption, the two are almost the same efficiency according to the experiments in articles [
Next compare with the GNFS approach. The GNFS approach is a deterministic one that can be parallelised. As article [
There is another approach stated by Kurzweg U H [
By now it can see that, the approach raised in this article is surely worthy of investigation because it is truly derived from the theorems and corollaries that are proved mathematically. In spite that the new approach is less of successful cases of factoring the RSA numbers, it does factorize many odd integers as many old approaches did in the history. As a new approach, it leaves of course quite a lot of researches to improve and perfect. For example, the probabilistic searching procedure is very rough and needs improving, and the time complexity of the algorithm has not be evaluated till now for its probabilistic trait and also for the authors’ limitation of the required knowledge. This points out the study of the future work. Hope it is concerned more and successful in the future.
The research work is supported by the State Key Laboratory of Mathematical Engineering and Advanced Computing under Open Project Program No.2017A01, Department of Guangdong Science and Technology under project 2015A030401105, Foshan Bureau of Science and Technology under projects 2016AG100311, Project gg040981 from Foshan University. The authors sincerely present thanks to them all.
Li, J.H. (2018) A Parallel Probabilistic Approach to Factorize a Semiprime. American Journal of Computational Mathematics, 8, 175-183. https://doi.org/10.4236/ajcm.2018.82013