 Int. J. Communications, Network and System Sciences, 2012, 5, 638-643 http://dx.doi.org/10.4236/ijcns.2012.529074 Published Online September 2012 (http://www.SciRP.org/journal/ijcns) Local Search Heuristics for NFA State Minimization Problem* Andrey V. Tsyganov Department of Higher Mathematics, Ulyanovsk State Pedagogical University, Ulyanovsk, Russia Email: andrew.tsyganov@gmail.com Received July 2, 2012; revised July 26, 2012; accepted August 6, 2012 ABSTRACT In the present paper we introduce new heuristic methods for the state minimization of nondeterministic finite automata. These methods are based on the classical Kameda-Weiner algorithm joined with local search heuristics, such as sto- chastic hill climbing and simulated annealing. The description of the proposed methods is given and the results of the numerical experiments are provided. Keywords: Nondeterministic Finite Automata; State Minimization; Heuristics; Local Search; Parallelism 1. Introduction Finite automata (FA) are widely used in various fields and especially in the theory of formal languages. We suppose that the reader is familiar with the basics of automata theory (see, for example, ) and provide only some necessary definitions. Let be an alphabet and n where i be a word. A set of words* is called a language12waa a:ia L. ,,,,The nondeterministic finite automaton (NFA) is a 5-tuple AQIF QQQ, where is a finite set of states,  is a finite alphabet,  is a transi-tion relation, IQ and FQ are respectively the sets of initial and final states. Transitions of the automa-ton A are often described by the transition function . The NFA is called deterministic (DFA) iff :Q2Q1I and ,:a ,1qaqQ (or ,1qa). Finite automata may be used to recognize and define languages. Two automata are called equivalent if they recognize one and the same language. For each NFA the equivalent DFA may be constructed using the powerset construction process (each state of such DFA is a subset of states of the original NFA). For a word the reverse word is 12 nwaa a1nnwaa a1, for a language L the reverse language is LwwL,,,, and for an automaton AQIF the reverse automaton is ,,,,AQFI where ,,qaqij iff ,,qaqjiFor a given language L a DFA which recognizes it and has the minimum possible number of states is called the canonical automaton and a DFA which recognizes . L and has the minimum possible number of states is called the reverse canonical automaton (these automata are unique for L up to isomorphism). The NFA state minimization problem is formulated as follows: for a given NFA A find an automaton which is equivalent to it and has the minimum possible number of states. Note that solution of this problem may not be unique. As it is shown in  the state minimization problem for NFA is PSPACE-complete. The worst case complexity for the same problem for DFA is logOQ Q . All known exact NFA state minimization methods use different types of exhaustive search and are computa-tionally hard. Very often they become impractical even for relatively small automata. This is one of the reasons why they are not implemented in software tools that deal with finite automata and related structures, such as AMoRE , FSM , Vaucanson , JFlap . More-over, only few of such tools provide heuristic NFA state minimization algorithms. In the present paper we propose new heuristic methods for NFA state minimization problem which are based on the classical Kameda-Weiner algorithm  and well- known local search heuristics (metaheuristics). The nov-elty of these methods is that the most time consuming part of the exact algorithm is replaced with fast heuristic procedures. The obtained methods are not exact but they allow to reduce minimization time. *The work is supported by the Ministry of Education and Science of the Russian Federation (grant n umber 1.919.2011). Copyright © 2012 SciRes. IJCNS A. V. TSYGANOV 639The remainder of the paper has the following structure. In Sections 2 and 3 a brief description of the Ka-meda-Weiner algorithm and local search heuristics is given, in Section 4 the proposed algorithm is described and in Section 5 the results of some numerical experi-ments are provided. 2. Kameda-Weiner Algorithm Lets us consider the brief description of the Kameda- Weiner algorithm (for detailed description see ). Suppose that NFA A is given. The algorithm searches for the minimum state NFA(s) equivalent to A using bi-nary matrix RAM (Reduced Automaton Matrix) which is constructed as follows. First, canonical automaton B and reverse canonical automaton C for A are constructed. Note that each state of these automata is a subset of states of the automaton A. Then, for each nonempty states i of and p1, ,Bi mjq of the element of the RAM is defined by the following formula ,ijr1,Cj n,.ijijpqpq0,1,ijr Let X be a subset of rows and Y a subset of columns of the RAM. Then XY1,3 2,32,1,2,3 2ab is called a (prime) grid if it sat-isfies the following conditions: 1) all intersections of its rows and columns contain 1 s; and 2) neither X nor Y can be enlarged without v iolating the fist condition. The set of grids cover RAM if each 1 in it belongs to at least one grid in the set. A minimum cover of RAM is a cover which consists of the minimum number of grids. Let us consider the NFA A with transitio n table shown in Table 1 (the example is taken from ). Tables 2 and 3 show the canonical automaton B and the reverse canonical automaton C of A respectively. RAM of A is presented in Table 4. There are 4 grids in RAM: 1) , 2) , 3) , 4) . It is easy to see that the first two grids make the minimum cover of RAM. 3 1,2 31,2,3Given a cover of RAM one can construct an NFA which may be equivalent to the original NFA A (in this case the cover is called legitimate). This is done by the means of the special intersection rule. The number of states in the constructed NFA equals to the number of grids in the cover. In the considered example the minimum cover of RAM is legitimate and yields the minimum state NFA shown in Table 5. The general schema of the Kameda-Weiner minimiza-tion technique is described by Algorithm 1. Note that steps 1, 3 and 4 of this algorithm theoretically have ex-ponential complexity. On practice the construction of the canonical automata usually performed rather quick and the most time consuming parts of the algorithm are steps Table 1. NFA A.  1 1,3 2  2 1 2,3  3 1 3 Table 2. Canonical automaton B. ab  12,3b2b1b  21b3b1b  31,3b3b1b Table 3. Reverse canonical automaton C. ab  11c2c4c  21, 2,3c2c2c 32,3c1c2c4c 4c4c  Table 4. RAM of A. 1 1,2,3 2,3 2,3 0 1 1 1 1 1 0 1,3 1 1 1 Table 5. Minimum state NFA equivalent to A. ab 1 p 12,pp 2p  2p 1p 2p Algorithm 1. Kameda-Weiner algorithm. Require: NFA A 1:Cons truct canonical automata B and C 2:Const ruct RAM 3:Find all g rids of RAM 4: Find minimum legiti mate c ove r(s) of RAM and construct minimum state NFA(s) using intersec tion rule Ensure: Minimum state NFA(s) equivalent to A Copyright © 2012 SciRes. IJCNS A. V. TSYGANOV 640 3 and 4 which perform the exhaustive search for grids and covers respectively. 3. Local Search Heuristics Local Search (LS) is a group of metaheuristic optimiza-tion techniques which are widely used especially in com-binatorial optimization. Its general schema is described by Algori thm 2. Each LS algorithm starts from some initial solution (line 1) and then iteratively updates it (lines 2 - 4) until stop condition is satisfied (in the most simple case it is the maximum number of steps). The Neighbor() function finds neighbors of the current solution Solution and the Update() function changes the current solution depending on the found neighbors. The quality of the solutions is compared using the special Cost() function. The simplest LS algorithm is called Hill Climbing (HC). In HC the Update() function always selects the best neighbor of the current solution (the steepest move). The Stochastic Hill Climbing (SHC) is a variant of HC which randomly chooses one of the neighbors and decides whether to move to it or to consider another neighbor. The disadvantage of the HC algorith ms is that th ey can easily stuck in the local optimum. The Simulated An-nealing (SA) is a more complicated LS algorithm which tries to avoid this problem. Algorithm 3 shows in detail the minimization process using SA. The distinctive featur e of the algorithm is the usage of the control parameter T (temperature) which slowly de-creases as the number of iterations k increases. The Neighbor() function generates a neighbor of the current solution and the Update() function accepts it with the probability 1,Pif 0;e,if 0,Tost Solution (1) where . Cost NewSolutionC As it follows from (1) the algorithm always accepts the generated neighbor if it is not worse than the current so-lution. If the neighbor is worse than the current solution it is accepted with some probability which decreases as the temperature decreases. So, the worse moves are made more often in the beginning of the optimization process. In classical SA the best solution is not stored (lines 2, 3, 8 - 12 are missing) and the algorithm returns the current solution. More details on heuristic optimization algorithms can be found in [8,9]. 4. Combinig Kameda-Weiner Algorithm with Local Search Heuristics First of all let us consider in more detail the last step of Algorithm 2. Local search. Require: InitialSolution 1: Solution := InitialSolution 2: repeat 3: Solution := Update(Neigh bo r(Solution)) 4: until StopCondition() Ensure: Solution Algorithm 3. Simulated annealing. Require: InitialSolution 1: Solution := InitialSolution 2: BestSolution := Solution 3: BestCost := Cost(Solution) 4: T := InitialTemperatue() 5: k := 0 6: while not StopCondition() do 7: NewSolution := Neighbor(Solution) 8: NewCost := Cost(NewSolution) 9: if NewCost < BestCost then 10: BestSolution := NewSolution 11: BestCost := NewCost 12: end if 13: Solution := Update(Solution, NewSolution, T) 14: k := k + 1 15: T := Update Temperature(T, k) 16: end while Ensure: BestSolution the Kameda-Weiner algorithm, i.e. the exhaustive search for minimum legitimate covers (see Algorithm 4). Here M is a set of minimum state NFAs. IsLegitimateCover() function tests whether the set of grids is a legitimate cover and IntersectionRule() constructs NFA using the cover. The bounds and of the main loop may minbe calculated as follows: imaximin 2logBiN where BN is the number of states in canonical automaton B, min, 1, 1iNNNmax GA BN where GN is the number of grids in RAM, A is the number of states in A and B is the number of states in B. NFor each step of the outer loop in the inner loop (lines 5 - 10) all possible i-combinations of grids have to be analyzed. The idea of the heuristic methods proposed in this paper is to replace this computationally hard process Copyright © 2012 SciRes. IJCNS A. V. TSYGANOV 641Algorithm 4. Exhaustive search for minimum legitimate covers. 1: Calculate and minimaxi2: :Mminimaxi:MM 3: for i from to do 4: found := false 5: for all i -combinations of grids Comb do 6: if IsLegitimateCover(Comb) then 7: {IntersectionRule(Comb)} 8: found := true 9: end if 10: end for 11: if found = true then 12: break 13: end if 14: end for by non-exhaustive LS procedures (of course, this may result in obtaining approximate solutions, i.e. reduced NFAs). This approach is described by Algorithm 5. So, we use LS in the Kameda-Weiner algorithm to find minimum covers of RAM and then analyze their size and legitimacy. Now let us consider the details of this process. The solution of both LS methods (SHC and SA) for the considered problem is a cover which is coded by a binary vector where 1 in i-th position means that i-th grid is included in a cover and 0 means that it is no t included. In the considered example vector means that only first two grids are included in the cover. 1,1,0,01,0,0,0minimaxTo start search LS algorithms need the initial so lution. The simplest way to setup the initial solution is to use the trivial solution with all bits set to 1. To obtain nontrivial initial solution Algorithm 6 may be used. The Cost() function simply counts the number of 1s in the vector and the Neighbor() function inverts several bits in it. After creating a neighbor of the current solutio n we need to check its feasibility (i.e. to check whether the obtained set of grids covers all 1s in RAM). If the con-structed neighbor is not a feasible solution then we add one or several 1s to it using algorithm similar to Algo-rithm 6. (e.g., in the considered example the solution is not feasible and it needs to be corrected ). To ensure the diversity of minimum covers one may run LS several times or use parallel versions of LS where each thread starts from its own initial solution. Note also that if the minimum legitimate covers not found then both exact end heuristic methods return the canonical automaton B if it has less number of states than the given Algorithm 5. Heuristic search for minimum legitimate cov-ers. 1: Calculate and i 2:Find minimum cover(s) of RAM using LS 3: if cover(s) of size less or equal to found then maxi4: Selec t mi n i mum le g i t i ma t e cover(s) and construct minimum state NFA(s) using intersection rule 5: end if Algorithm 6. Construction of initial solution for LS algo-rithms. 1: repeat 2: Take next 1 in the RAM 3: Randomly choose a grid that covers this 1 4: Set to 1 the corresponding bit of the InitialSolution 5: Eliminate all 1s that are covered by the chosen grid from the RAM 6: until there are 1s in the RAM automaton A. 5. Numerical Experiments We have implemented the proposed NFA minimization methods in the ReFaM project. ReFaM (Rational Ex-pressions and Finite Automata Minimization) is a part of the HeO (Heuristic Optimization) library. This library is a cross-platform open source project written in C++ that provides several parallel metaheuristic optimization methods such as Genetic Algorithm (GA), Simulated Annealing (SA), Stochastic Hill Climbing (SHC) and Branch and Bound (BnB). These methods are imple-mented as algorithmic skeletons using metaprogramming, pattern design and different parallelization techniques (OpenMP and MPI). The latest version of the library may be obtained via SVN (the homepage of the project: http://code.google.com/p/heo/). In the exact version of the Kameda-Weiner algorithm the search for grids of the RAM and the search for minimum legitimate covers were parallelized using OpenMP and MPI techniques. In the heuristic versions of this algorithm the parallel versions of SHC and SA algo-rithms of the HeO library were used. Each version of the algorithm is implemented in a separate solver. Let us compare the performance of the exact and heu-ristic solvers for the random sample of the 100 pairwise inequivalent trim NFAs generated with the following parameters: number of states 6Q, number of initial states 1I, number of final states 2F, alphabet Copyright © 2012 SciRes. IJCNS A. V. TSYGANOV 642 size 2 , transition density 20.3TQND, where T is the number of automaton transitions. The computa-tional experiments were conducted using the following SMP system: Intel Core 2 Quad Q6600 2.4 Ghz, 4 Gb RAM, MS Wind o ws XP P r of es si onsl SP 3. First of all let us consider the results of the exact solver which are presen ted in Table 6. Here, m and n are the number of rows and number of columns in RAM respectively; d is the density of ones in RAM; G is the number of grids in RAM; MN is the number of the minimum state NFAs; MQ is the number of states in the mini mum state NFAs; min and max are minimal and maximal values;  is the mean value;  is the stan-dard deviation. The results were obtained using 4 threads. The average minimization time is 1.1 seconds and the total number of the minimum state automata found for the whole sample is 268, 9 automata have no equivalent minimum state NFA. As it can be seen from the table some of the automata in the sample have multiple mini-mum state NFAs (the mean value is 2.95) and the aver-age number of states in the minimum state NFAs is 3.52. Now let us consider the results of the heuristic solvers obtained with different number of threads N = 1, 2, 4, 8, 16, 32, 64 which are presented in Tables 7 and 8. Since LS is used only at the last stage of the Kameda-Weiner algorithm the first 4 columns of Table 6 will be the same for heuristic solvers and we replace them with the fol-lowing columns: T—the total number of minimum (reduced) state NFAs found for the sample, U— num-ber of unminimized automata, T—average minimization time in seconds (for columns NNMN and MQGN the mean values are provided). As it can be seen from these tables the total number of the minimum state NFAs increases and the number of unminimized automata decreases as the number of threads grows. The average minimization time is very small and remains almost constant up to 4 threads and then increases proportionally to the number of threads because the hardware used for experiments supports si-multaneous execution only of 4 threads. 6. Conclusion In the present paper we have considered new heuristic algorithms for NFA state minimization problem which is known to be computationally hard. These algorithms are a combination of the classical Kameda-Weiner algorithm and local search heuristics which are widely used in combinatorial optimization. The essential feature of the proposed algorithm is that the most time consuming part of the exact algorithm is replaced with fast local search procedures. Numerical experiments have shown that such combination is much less time consuming and allows to Table 6. Exact algorithm results. m n d MN MQmin 1 1 0.63 1 0 1 max 16 14 1.00 92 32 5  5.70 6.00 0.73 15.40 2.95 3.52  2.83 2.85 0.05 17.74 5.21 0.95 Table 7. SA algorithm results. TNUN N MN MQ T 1 69 31 1.00 3.48 0.0071 2 76 28 1.06 3.43 0.0071 4 98 25 1.31 3.47 0.0078 8 114 20 1.43 3.49 0.0148 16 122 20 1.53 3.41 0.0258 32 137 19 1.69 3.41 0.0491 64 157 16 1.87 3.45 0.0917 Table 8. SHC algorithm results. TNUN N MN MQ T 1 69 31 1.00 3.52 0.0057 2 75 28 1.04 3.42 0.0056 4 95 26 1.28 3.46 0.0059 8 112 21 1.42 3.52 0.0111 16 128 24 1.68 3.42 0.0192 32 148 16 1.76 3.46 0.0364 64 161 16 1.92 3.45 0.0674 obtain acceptable results. In the future we plan to con-centrate on the other time consuming part of the Ka-meda-Weiner algorithm, i.e. the exhaustive search for grids of the RAM. REFERENCES  J. E. Hopcroft and J. D. Ullman, “Introduction to Auto-mata Theory, Languages, and Computation,” Adison- Wesley Publishing Company, Reading, 1979.  T. Jiang and B. Ravikumar, “Minimal NFA Problems Are Hard,” SIAM Journal on Computing, Vol. 22, No. 6, 1993, pp. 1117-1141. doi:10.1137/0222067  V. Kell, A. Maier, A. Potthoff, W. Thomas and U. Wer-muth, “AMORE: A System for Computing Automata, Monoids and Regular Expressions,” Proceedings of the 6th Annual Symposium on Theoretical Aspects of Com-puter Science on STACS 89, Springer-Verlag, New York, Copyright © 2012 SciRes. IJCNS A. V. TSYGANOV Copyright © 2012 SciRes. IJCNS 6431989, pp. 537-538.  M. Mohri, F. Pereira and M. Riley, “AT&T General- Purpose Finite-State Machine Software Tools,” 1997. http://www.research.att.com/sw/tools/fsm  S. Lombardy, R. Poss, Y. Régis-Gianas and J. Sa-karovitch, “Introducing VAUCANSON,” In: O. H. Ibarra and Z. Dang, Eds., Implementation and Application of Automata, CIAA 2003, Santa Barbara, 16-18 July 2003, pp. 96-107.  S. H. Rodger, “JFLAP: An Interactive Formal Languages and Automata Package,” Jones and Bartlett Publishers, Inc., USA, 2006.  T. Kameda and P. Weiner, “On the State Minimization of Nondeterministic Finite Automata,” IEEE Transactions on Computers, Vol. C-19, No. 7, 1970, pp. 617-627. doi:10.1109/T-C.1970.222994  J. Hromkovic, “Algorithmics for Hard Problems—Intro- duction to Combinatorial Optimization, Randomization, Approximation, and Heuristics,” Springer, Berlin, 2001.  F. Glover and G. A. Kohenberger, “Handbook of Meta-heuristics,” Kluwer Academic Publishers, Boston, 2003.