Paper Menu >>
Journal Menu >>
![]() J. Software Engineering & Applications, 2009, 2: 111-115 doi:10.4236/jsea.2009.22016 Published Online July 2009 (www.SciRP.org/journal/jsea) Copyright © 2009 SciRes JSEA An Optimal Algorithm for Prufer Codes* Xiaodong Wang1, 2, Lei Wang3, Yingjie Wu1 1Department of Computer Science, Fuzhou University, Fuzhou, China; 2Department of Computer Science, Quanzhou Normal Uni- versity, Quanzhou, China; 3College of Computing, Georgia Institute of Technology, Atlanta GA 30332, USA. Email: wangxd@fzu.edu.cn Received February 19th, 2009; revised February 23rd, 2009; accepted February 24th, 2009. ABSTRACT This paper studies the algorithms for coding and decoding Prufer codes of a labeled tree. The algorithms for coding and decoding Prufer codes of a labeled tree in the literatures require time usually. Although there exist linear time algorithms for Prufer-like codes [1,2,3], the algorithms utilize the integer sorting algorithms. The special range of the integers to be sorted is utilized to obtain a linear time integer sorting algorith m. The Prufer code problem is reduced to integer sortin g. In this paper we consider the Prufer code problem in a different a ngle and a more direct manner. We start from a naïve algorithm, then improved it gradually and finally we obtain a very prac tical linear time algorithm. The techniques we used in this paper are of interest in their own right. (Onlogn) Keywords: Design of Algorithm, Labeled Trees, Prufer Codes, Integer Sorting 1. Introduction Labeled trees are of interest in practical and theoretical areas of computer science. For example, Ethernet has a unique path between terminal devices, thus being a tree: labeling the tree nodes is necessary to uniquely identify each device in the network. An interesting alternative to the usual representations of tree data structures in com- puter memories is based on coding labeled trees by means of strings of node labels. This representation was first used in the proof of Cayley’s theorem [4] to show a one-to-one correspondence between free labeled trees on n nodes and strings of length n-2. In addition to this purely mathematical use, string-based coding of trees has many practical applications. For instance, they make it possible to generate random uniformly distrib- uted trees and random connected graphs: the generation of a random string followed by the use of a fast decod- ing algorithm is typically more efficient than random tree generation by the addition of edges, since in the latter case one must pay attention not to introduce cy- cles. In addition, tree codes are employed in genetic algorithms, where chromosomes in the population are represented as strings of integers, and in heuristics for computing minimum spanning trees with additional constraints, e.g., on the number of leaves or on the di- ameter of the tree itself. Not last, tree codes are used for data compression and for computing the tree and for- est volumes of graphs. Let T be a labeled tree whose nodes are numbered from 0 to n-1. For some vertex v in T, the degree of v, denoted by d[v], is the number of edges incident to v. If d[v]=1, then v is called a leaf. According to Prufer’s proof, any sequence of n-2 numbers, each number in {0,1,…,n-1} can determine a unique labeled tree of n nodes. Such a number sequence is the Prufer code of a labeled tree. Algorithm A is the straightforward imple- mentation of Prufer’s proof. Algorithm A Input: A labeled tree T of n nodes as a list of n-1 edges. Output: c, the Prufer code of T. Method: Step 1. B←{0,1,…,n-1}. Step 2. For i ← 0 to n-3 do Step 2.1. x ← min{kB: k is a leaf }. Step 2.2. B←B-{x}. Step 2.3. Remove x and its incident edge (x, y) from T. Step 2.4. c[i]←y. Step 3. Return c. End of Algorithm For example, let the input labeled tree T be the graph depicted in Figure 1. After the Algorithm A terminates, c = [2,4,0,1,3,3] which is the Prufer code of T . *Supported by Natural Science Foundation of China under Grant No. 60172017 and Natural Science Foundation of Fujian under Grant No. A0510008. ![]() An Optimal Algorithm for Prufer Codes 112 Figure 1. A labelled tree The algorithms for coding and decoding Prufer codes of a labeled tree in the literatures require tim- e usually. As stated in [3], although the problem of pro- ducing a Prufer code in linear time is an exercise in two books [5, 6], there exists no explicit publication of a so- lution. In [3] an time algorithm for generating a Prufer code from a labeled tree was described, but the time algorithm utilized the integer sorting algo- rithms. The special range of the integers to be sorted was utilized to obtain a linear time integer sorting algorithm. The Prufer code problem is reduced to integer sorting. In this paper we consider the Prufer code problem in a dif- ferent angle and present a very practical linear time algo- rithm directly. The techniques we used in this paper are of interest in their own right. (Onlogn) ) ) )) ()On ()On 2. A Linear Time Algorithm for Coding The most time consuming step in Algorithm A is Step 2. The leaf elimination scheme of the Prufer code implicitly defines a root for the labeled tree T. Actually, it is easy to see that the last element in the code is n-1 which is the root of the labeled tree T. Given a list of n-1 edges, we can build the parent array f and the degree array d of the labeled tree T with only preprocessing time, wh- ich allows checking and updating a node's degree in time. With this preprocessing, the Step 2.3 of Alg- orithm A can be implemented in time. Using a he- ap, the leaf with the smallest number can be found in time. Hence, Step 2 takes time to- tally. The time complexity of Algorithm A is also . ()On (1)O (On l On l (1)O ogn () ogn (On logn We can improve the algorithms further. An insight into the problem is that for the heap operations in the Step 2.1 of Algorithm A, there is a special kind of nodes that they are deleted from the heap immediately after they are inserted into the heap. This kind of nodes can be treated easily without heap operation. The remained heap operation can be replaced by a linear scan of the degree array d. Therefore the total time to find x in the Step 2.1 is reduced to O(n). The linear time algorithm can be presented detailed as follows. Coding Algorithm: 1: index ← x ← min{0≤k<n: d[k]=1} 2: for i ← 0 to n-3 do 3: y ← f[x] 4: c[i] ← y 5: d[y] ← d[y]-1 6: if y<index and d[y]=1 then x ← y 7: else index ← x ← min{index<k<n: d[k]=1} In the algorithm described above, d[v] is the degree of node v and f[v] is the parent node of the node v. The variable index is a cursor of degree array d. On line 6, when node y becomes a leaf and the label of y is less than the current cursor index, the node y must be the kind of nodes that are deleted from the heap immediately after they are inserted into the heap. In this case, the node y becomes the next node with minimal label. Oth- erwise, on line 7 the cursor index moves to the next leaf node in the degree array d. The computing time of line 1 and line 7 is , since the cursor index goes through the degree array d from left to right once. The remaining time is clearly , leading to complexity for the entire algorithm. ()On (On)()On 3. A Linear Time Algorithm for Decoding Decoding is to build the tree T corresponding to the given Prufer code c. As far as c is computed, each node label in it represents the parent of a leaf eliminated from T. Hence, in order to reconstruct T , it is sufficient to co- mpute the ordered sequence of labels of the eliminated leaves, say s: for each i {0,1,…,n-1}, the pair (c[i], s[i]) will thus be an edge in the tree. We first observe that the leaves of T are exactly those nodes that do not appear in the code, as they are not par- ents of any node. Each internal node, say v, in general may appear in c more than once; each appearance corre- sponds to the elimination of one of its children, and therefore to decreasing the degree of v by 1. After the rightmost occurrence in the code, v is clearly a leaf and thus becomes a candidate for being eliminated. Therefore, the times of a node v appears in c is exactly the degree of v minus 1. A linear scan of code array c can determine the degree array d. Using a heap, the leaf with the smallest number can be found in time, leading an time de- coding algorithm. Like the coding algorithm, we can improve the algorithms further. An insight into the prob- lem is also that for the heap operations of the decoding algorithm, there is a special kind of nodes that they are deleted from the heap immediately after they are inserted into the heap. This kind of nodes can be treated easily without heap operation. The remained heap operation can be replaced by a linear scan of the degree array d. (O nlogn(Onlogn Copyright © 2009 SciRes JSEA ![]() An Optimal Algorithm for Prufer Codes 113 The linear time algorithm can be presented detailed as follows. Decoding Algorithm: 1: index ← x ← min{0≤k<n: d[k]=1} 2: for i ← 0 to n-2 do 3: y ← c[i] 4: add edge (x,y) to T 5: d[y] ← d[y]-1 6: if y<index and d[y]=1 then x ← y 7: else index ← x ← min{index<k<n: d[k]=1} Like the coding algorithm, the time required by the decoding is also . ()On 4. Examples We use the labeled tree T depicted in Figure 1 as an ex- ample to demonstrate the algorithms for coding and de- coding Prufer codes described above. The input of the labeled tree T is an edge list {0,1}{0,4}{1,3}{4,2}{3,6} {3,7}{2,5}. There are 8(n) nodes labeled 0,1,2,3,4,5,6,7 and 7(n-1) edges. It is easy to see that the last element 7 is the root of the labeled tree T. For the given edge list, we can build the parent array f and the degree array d of the labeled tree T by a depth first search with only time. ()On With this two arrays, the Step 2.3 of Algorithm A can be implemented in time. (1)O For our linear time coding algorithm, we first go through the degree array d to find an index such that in- dex=min{0≤k<n: d[k]=1}. It is clear that the index equals 5 for the first time in our example as shown in Table 1. Table 1. The parent array f and the degree array d of T i 0 1 2 3 4 5 6 7 f[i] 1 3 4 7 0 2 3 -1 d[i] 2 2 2 3 2 1 1 1 ↑ index Table 2. After edge {2,5} is deleted i 0 1 2 3 4 [5] 6 7 f[i] 1 3 4 7 0 2 3 -1 d[i] 2 2 1 3 2 1 1 1 ↑ index Table 3. After edge {4,2} is deleted i 0 1 [2]3 4 [5] 6 7 f[i]1 3 4 7 0 2 3 -1 d[i]2 2 1 3 1 1 1 1 ↑ index Table 4. After the edges {0,4}, {0,1} and {1,3} are deleted i [0][1][2]3 [4] [5] 6 7 f[i]1 3 4 7 0 2 3 -1 d[i]1 1 1 2 1 1 1 1 ↑ index Table 5. After the edge {3,6} is deleted i [0][1][2]3 [4] [5] [6] 7 f[i]1 3 4 7 0 2 3 -1 d[i]1 1 1 1 1 1 1 1 ↑ index f[index]=f[5]=2 is the first value of the Prufer code c. The edge {2,5} is deleted and d[2] is decreased by 1 on line 5 of the coding algorithm. The status of the tree T now becomes Table 2. Follows the coding algorithm on line 6 the next node to be deleted is 2. The father node of 2 is node 4, the next value of the Prufer code c. The edge {4,2} is deleted and d[4] is decreased by 1 on line 5 of the coding algorithm. The status of the tree T now becomes Table 3. Similarly, in the next three steps we obtain the next three values 0,1 and 3 of the Prufer code c. The edges {0,4}, {0,1} and {1,3} are deleted accordingly. The Prufer code we have obtained up to now is (2,4,0,1,3). The status of the tree T now becomes Table 4. Look at the Table 4, the next node to be deleted is 6 which is determined on line 7 of our coding algorithm. We do not scan the degree array d from the beginning. We scan the degree array d from index (5) to right and the index is moved to 6. This is a key point to see the algo- rithm running in linear time. The father node of 6 is node 3, the next value of the Prufer code c. The edge {3,6} is deleted and d[3] is de- creased by 1 on line 5 of the coding algorithm. The status of the tree T now becomes Table 5. Copyright © 2009 SciRes JSEA ![]() An Optimal Algorithm for Prufer Codes 114 Table 6. The initio status of the decoding algorithm i 0 1 2 3 4 5 6 7 c[i] 2 4 0 1 3 3 7 -1 d[i] 2 2 2 3 2 1 1 1 ↑ index Table 7. After edge {2,5} is added i 0 1 2 3 4 [5] 6 7 c[i] 2 4 0 1 3 3 7 -1 d[i] 2 2 1 3 2 1 1 1 ↑ index Table 8. After edge {2,4} is added i 0 1 [2] 3 4 [5] 6 7 c[i] 2 4 0 1 3 3 7 -1 d[i] 2 2 1 3 1 1 1 1 ↑ index We now have the n-2 values of the Prufer code (2,4,0,1,3,3). The algorithm terminates. Using the same labeled tree T depicted in Figure 1 as an example we can demonstrate the algorithm for de- coding Prufer codes as follows. The input of the decoding algorithm is the Prufer code c of the tree T. In our example the Prufer code of the tree T is (2,4,0,1,3,3). We fist noted that the times of a node v appears in the Prufer code c is exactly the degree of v minus 1. A linear scan of code array c can determine the degree array d as shown in Table 6. For our linear time decoding algorithm, we first go through the degree array d to find an index such that in- dex=min{0≤k<n:d[k]=1}. It is clear that the index equals 5 for the first time in our example as shown in Table 6 which is the first leaf node deleted in the coding algo- rithm. c[0]=2 is the other node label of the edge to be added. The edge {2,5} is added and d[2] is decreased by 1 on line 5 of the decoding algorithm. The status of the tree T now becomes Table 7. Follows the decoding algorithm on line 6 the next node to be added is 2. c[1]=4 is the other node label of the edge to be added. The edge {2,4} is added and d[4] is decreased by 1 on line 5 of the decoding algorithm. The status of the tree T now becomes Table 8. Similarly, in the next three steps we obtain the next three edges {0,4}, {0,1} and {1,3}. The status of the tree T now becomes Table 9. Look at the Table 9. The next node to be added is 6 which is determined on line 7 of our decoding algorithm. We do not scan the degree array d from the beginning. We scan the degree array d from index (5) to right and the index is moved to 6. This is a key point to see the decoding algorithm running in linear time. The other node label of the edge to be added is c[5]=3. The edge {3,6} is added and d[3] is decreased by 1 on line 5 of the decoding algorithm. The status of the tree T now becomes Table 10. Follows the decoding algorithm on line 6 the next node to be added is 3. c[6]=7 is the other node label of the edge to be added. The edge {3,7} is added. We now have the n-1 edges of the tree T. The decoding algorithm terminates. 5. Conclusions Prufer codes for labeled trees have many practical appli- cations. The algorithms for coding and decoding Prufer codes of a labeled tree are of interest in practical and theoretical areas of computer science. The existing linear time algorithms for Prufer-like codes utilized the integer sorting algorithms. The special range of the integers to be sorted is utilized to obtain a linear time integer sorting algorithm. The optimal algorithms for coding and de- coding Prufer codes presented in this paper are very practical linear time algorithms. The techniques used in these algorithms are of interest in their own right. Table 9. After the edges {0, 4}, {0,1} and {1,3} are added i [0][1][2]3 [4] [5] 6 7 c[i] 2 4 0 1 3 3 7 -1 d[i] 1 1 1 2 1 1 1 1 ↑ index Table 10. After the edge {3,6} is added i [0][1][2]3 [4] [5] [6]7 c[i] 2 4 0 1 3 3 7 -1 d[i] 1 1 1 1 1 1 1 1 ↑ index Copyright © 2009 SciRes JSEA ![]() An Optimal Algorithm for Prufer Codes Copyright © 2009 SciRes JSEA 115 6. Acknowledgments This work was supported by Natural Science Foundation of China under Grant No. 60172017 and Natural Science Foundation of Fujian under Grant No. A0510008. REFERENCES [1] S. Caminiti, I. Finocchi, and R. Petreschi, “A unified approach to coding labeled trees,” in Proceedings of the 6th Latin American Symposium on Theoretical Infor- matics (LATIN’04), LNCS 2976, pp. 339-348, 2004. [2] S. Caminiti, I. Finocchi, and R. Petreschi, “On coding la- beled trees,” To appear on Theoretical Computer Sci- ence, 2006. [3] H. C. Chen and Y. L. Wang, “An efficient algorithm for generating Prufer codes from labeled trees,” Theory of Computing Systems, Vol. 33, pp. 97–105, 2000. [4] A. Cayley, “A theorem on trees,” Quarterly Journal of Mathematics, Vol. 23, pp. 376–378, 1889. [5] L. Devroye, “Non-uniform random variate generation,” Springer-Verlag, New York, Exercise 2, pp. 666, 1986. [6] A. Nijenhuis and H. S. Wilf, “Combinatorial algorithms for computers and calculators,” Second Editon, Academic Press, New York, Exercise 46, pp. 293, 1978. |