J. Software Engineering & Applications, 2009, 2: 111-115
doi:10.4236/jsea.2009.22016 Published Online July 2009 (www.SciRP.org/journal/jsea)
Copyright © 2009 SciRes JSEA
An Optimal Algorithm for Prufer Codes*
Xiaodong Wang1, 2, Lei Wang3, Yingjie Wu1
1Department of Computer Science, Fuzhou University, Fuzhou, China; 2Department of Computer Science, Quanzhou Normal Uni-
versity, Quanzhou, China; 3College of Computing, Georgia Institute of Technology, Atlanta GA 30332, USA.
Email: wangxd@fzu.edu.cn
Received February 19th, 2009; revised February 23rd, 2009; accepted February 24th, 2009.
ABSTRACT
This paper studies the algorithms for coding and decoding Prufer codes of a labeled tree. The algorithms for coding
and decoding Prufer codes of a labeled tree in the literatures require time usually. Although there exist
linear time algorithms for Prufer-like codes [1,2,3], the algorithms utilize the integer sorting algorithms. The special
range of the integers to be sorted is utilized to obtain a linear time integer sorting algorith m. The Prufer code problem
is reduced to integer sortin g. In this paper we consider the Prufer code problem in a different a ngle and a more direct
manner. We start from a naïve algorithm, then improved it gradually and finally we obtain a very prac tical linear time
algorithm. The techniques we used in this paper are of interest in their own right.
(Onlogn)
Keywords: Design of Algorithm, Labeled Trees, Prufer Codes, Integer Sorting
1. Introduction
Labeled trees are of interest in practical and theoretical
areas of computer science. For example, Ethernet has a
unique path between terminal devices, thus being a tree:
labeling the tree nodes is necessary to uniquely identify
each device in the network. An interesting alternative to
the usual representations of tree data structures in com-
puter memories is based on coding labeled trees by
means of strings of node labels. This representation was
first used in the proof of Cayley’s theorem [4] to show
a one-to-one correspondence between free labeled trees
on n nodes and strings of length n-2. In addition to this
purely mathematical use, string-based coding of trees
has many practical applications. For instance, they
make it possible to generate random uniformly distrib-
uted trees and random connected graphs: the generation
of a random string followed by the use of a fast decod-
ing algorithm is typically more efficient than random
tree generation by the addition of edges, since in the
latter case one must pay attention not to introduce cy-
cles. In addition, tree codes are employed in genetic
algorithms, where chromosomes in the population are
represented as strings of integers, and in heuristics for
computing minimum spanning trees with additional
constraints, e.g., on the number of leaves or on the di-
ameter of the tree itself. Not last, tree codes are used for
data compression and for computing the tree and for-
est volumes of graphs.
Let T be a labeled tree whose nodes are numbered
from 0 to n-1. For some vertex v in T, the degree of v,
denoted by d[v], is the number of edges incident to v. If
d[v]=1, then v is called a leaf. According to Prufer’s
proof, any sequence of n-2 numbers, each number in
{0,1,…,n-1} can determine a unique labeled tree of n
nodes. Such a number sequence is the Prufer code of a
labeled tree. Algorithm A is the straightforward imple-
mentation of Prufer’s proof.
Algorithm A
Input: A labeled tree T of n nodes as a list of n-1
edges.
Output: c, the Prufer code of T.
Method:
Step 1. B{0,1,…,n-1}.
Step 2. For i 0 to n-3 do
Step 2.1. x min{kB: k is a leaf }.
Step 2.2. BB-{x}.
Step 2.3. Remove x and its incident edge (x, y) from T.
Step 2.4. c[i]y.
Step 3. Return c.
End of Algorithm
For example, let the input labeled tree T be the graph
depicted in Figure 1. After the Algorithm A terminates,
c = [2,4,0,1,3,3] which is the Prufer code of T .
*Supported by Natural Science Foundation of China under Grant No.
60172017 and Natural Science Foundation of Fujian under Grant No.
A0510008.
An Optimal Algorithm for Prufer Codes
112
Figure 1. A labelled tree
The algorithms for coding and decoding Prufer codes
of a labeled tree in the literatures require tim-
e usually. As stated in [3], although the problem of pro-
ducing a Prufer code in linear time is an exercise in two
books [5, 6], there exists no explicit publication of a so-
lution. In [3] an time algorithm for generating a
Prufer code from a labeled tree was described, but the
time algorithm utilized the integer sorting algo-
rithms. The special range of the integers to be sorted was
utilized to obtain a linear time integer sorting algorithm.
The Prufer code problem is reduced to integer sorting. In
this paper we consider the Prufer code problem in a dif-
ferent angle and present a very practical linear time algo-
rithm directly. The techniques we used in this paper are
of interest in their own right.
(Onlogn)
) )
))
()On
()On
2. A Linear Time Algorithm for Coding
The most time consuming step in Algorithm A is Step 2.
The leaf elimination scheme of the Prufer code implicitly
defines a root for the labeled tree T. Actually, it is easy
to see that the last element in the code is n-1 which is the
root of the labeled tree T. Given a list of n-1 edges, we
can build the parent array f and the degree array d of the
labeled tree T with only preprocessing time, wh-
ich allows checking and updating a node's degree in
time. With this preprocessing, the Step 2.3 of Alg-
orithm A can be implemented in time. Using a he-
ap, the leaf with the smallest number can be found in
time. Hence, Step 2 takes time to-
tally. The time complexity of Algorithm A is also
.
()On
(1)O
(On
l
On
l
(1)O
ogn
()
ogn
(On
logn
We can improve the algorithms further. An insight
into the problem is that for the heap operations in the
Step 2.1 of Algorithm A, there is a special kind of nodes
that they are deleted from the heap immediately after
they are inserted into the heap. This kind of nodes can be
treated easily without heap operation. The remained heap
operation can be replaced by a linear scan of the degree
array d. Therefore the total time to find x in the Step 2.1
is reduced to O(n).
The linear time algorithm can be presented detailed as
follows.
Coding Algorithm:
1: index x min{0k<n: d[k]=1}
2: for i 0 to n-3 do
3: y f[x]
4: c[i] y
5: d[y] d[y]-1
6: if y<index and d[y]=1 then x y
7: else index x min{index<k<n: d[k]=1}
In the algorithm described above, d[v] is the degree of
node v and f[v] is the parent node of the node v. The
variable index is a cursor of degree array d. On line 6,
when node y becomes a leaf and the label of y is less
than the current cursor index, the node y must be the
kind of nodes that are deleted from the heap immediately
after they are inserted into the heap. In this case, the
node y becomes the next node with minimal label. Oth-
erwise, on line 7 the cursor index moves to the next leaf
node in the degree array d. The computing time of line 1
and line 7 is , since the cursor index goes through
the degree array d from left to right once. The remaining
time is clearly , leading to complexity for
the entire algorithm.
()On
(On)()On
3. A Linear Time Algorithm for Decoding
Decoding is to build the tree T corresponding to the
given Prufer code c. As far as c is computed, each node
label in it represents the parent of a leaf eliminated from
T. Hence, in order to reconstruct T , it is sufficient to co-
mpute the ordered sequence of labels of the eliminated
leaves, say s: for each i {0,1,…,n-1}, the pair (c[i], s[i])
will thus be an edge in the tree.
We first observe that the leaves of T are exactly those
nodes that do not appear in the code, as they are not par-
ents of any node. Each internal node, say v, in general
may appear in c more than once; each appearance corre-
sponds to the elimination of one of its children, and
therefore to decreasing the degree of v by 1. After the
rightmost occurrence in the code, v is clearly a leaf and
thus becomes a candidate for being eliminated. Therefore,
the times of a node v appears in c is exactly the degree of
v minus 1. A linear scan of code array c can determine
the degree array d.
Using a heap, the leaf with the smallest number can be
found in time, leading an time de-
coding algorithm. Like the coding algorithm, we can
improve the algorithms further. An insight into the prob-
lem is also that for the heap operations of the decoding
algorithm, there is a special kind of nodes that they are
deleted from the heap immediately after they are inserted
into the heap. This kind of nodes can be treated easily
without heap operation. The remained heap operation
can be replaced by a linear scan of the degree array d.
(O nlogn(Onlogn
Copyright © 2009 SciRes JSEA
An Optimal Algorithm for Prufer Codes 113
The linear time algorithm can be presented detailed as
follows.
Decoding Algorithm:
1: index x min{0k<n: d[k]=1}
2: for i 0 to n-2 do
3: y c[i]
4: add edge (x,y) to T
5: d[y] d[y]-1
6: if y<index and d[y]=1 then x y
7: else index x min{index<k<n: d[k]=1}
Like the coding algorithm, the time required by the
decoding is also .
()On
4. Examples
We use the labeled tree T depicted in Figure 1 as an ex-
ample to demonstrate the algorithms for coding and de-
coding Prufer codes described above. The input of the
labeled tree T is an edge list {0,1}{0,4}{1,3}{4,2}{3,6}
{3,7}{2,5}. There are 8(n) nodes labeled 0,1,2,3,4,5,6,7
and 7(n-1) edges.
It is easy to see that the last element 7 is the root of the
labeled tree T. For the given edge list, we can build the
parent array f and the degree array d of the labeled tree T
by a depth first search with only time.
()On
With this two arrays, the Step 2.3 of Algorithm A can
be implemented in time.
(1)O
For our linear time coding algorithm, we first go
through the degree array d to find an index such that in-
dex=min{0k<n: d[k]=1}. It is clear that the index equals
5 for the first time in our example as shown in Table 1.
Table 1. The parent array f and the degree array d of T
i
0 1 2 3 4 5 6 7
f[i] 1 3 4 7 0 2 3 -1
d[i] 2 2 2 3 2 1 1 1
index
Table 2. After edge {2,5} is deleted
i
0 1 2 3 4 [5] 6 7
f[i] 1 3 4 7 0 2 3 -1
d[i] 2 2 1 3 2 1 1 1
index
Table 3. After edge {4,2} is deleted
i
0 1 [2]3 4 [5] 6 7
f[i]1 3 4 7 0 2 3 -1
d[i]2 2 1 3 1 1 1 1
index
Table 4. After the edges {0,4}, {0,1} and {1,3} are deleted
i
[0][1][2]3 [4] [5] 6 7
f[i]1 3 4 7 0 2 3 -1
d[i]1 1 1 2 1 1 1 1
index
Table 5. After the edge {3,6} is deleted
i
[0][1][2]3 [4] [5] [6] 7
f[i]1 3 4 7 0 2 3 -1
d[i]1 1 1 1 1 1 1 1
index
f[index]=f[5]=2 is the first value of the Prufer code c.
The edge {2,5} is deleted and d[2] is decreased by 1 on
line 5 of the coding algorithm. The status of the tree T
now becomes Table 2.
Follows the coding algorithm on line 6 the next node to
be deleted is 2. The father node of 2 is node 4, the next
value of the Prufer code c. The edge {4,2} is deleted and
d[4] is decreased by 1 on line 5 of the coding algorithm.
The status of the tree T now becomes Table 3.
Similarly, in the next three steps we obtain the next
three values 0,1 and 3 of the Prufer code c. The edges
{0,4}, {0,1} and {1,3} are deleted accordingly. The Prufer
code we have obtained up to now is (2,4,0,1,3). The status
of the tree T now becomes Table 4.
Look at the Table 4, the next node to be deleted is 6
which is determined on line 7 of our coding algorithm. We
do not scan the degree array d from the beginning. We
scan the degree array d from index (5) to right and the
index is moved to 6. This is a key point to see the algo-
rithm running in linear time.
The father node of 6 is node 3, the next value of the
Prufer code c. The edge {3,6} is deleted and d[3] is de-
creased by 1 on line 5 of the coding algorithm. The status
of the tree T now becomes Table 5.
Copyright © 2009 SciRes JSEA
An Optimal Algorithm for Prufer Codes
114
Table 6. The initio status of the decoding algorithm
i
0 1 2 3 4 5 6 7
c[i] 2 4 0 1 3 3 7 -1
d[i] 2 2 2 3 2 1 1 1
index
Table 7. After edge {2,5} is added
i
0 1 2 3 4 [5] 6 7
c[i] 2 4 0 1 3 3 7 -1
d[i] 2 2 1 3 2 1 1 1
index
Table 8. After edge {2,4} is added
i
0 1 [2] 3 4 [5] 6 7
c[i] 2 4 0 1 3 3 7 -1
d[i] 2 2 1 3 1 1 1 1
index
We now have the n-2 values of the Prufer code
(2,4,0,1,3,3). The algorithm terminates.
Using the same labeled tree T depicted in Figure 1 as
an example we can demonstrate the algorithm for de-
coding Prufer codes as follows.
The input of the decoding algorithm is the Prufer code
c of the tree T. In our example the Prufer code of the tree
T is (2,4,0,1,3,3). We fist noted that the times of a node v
appears in the Prufer code c is exactly the degree of v
minus 1. A linear scan of code array c can determine the
degree array d as shown in Table 6.
For our linear time decoding algorithm, we first go
through the degree array d to find an index such that in-
dex=min{0k<n:d[k]=1}. It is clear that the index equals
5 for the first time in our example as shown in Table 6
which is the first leaf node deleted in the coding algo-
rithm.
c[0]=2 is the other node label of the edge to be added.
The edge {2,5} is added and d[2] is decreased by 1 on
line 5 of the decoding algorithm. The status of the tree T
now becomes Table 7.
Follows the decoding algorithm on line 6 the next
node to be added is 2. c[1]=4 is the other node label of
the edge to be added. The edge {2,4} is added and d[4] is
decreased by 1 on line 5 of the decoding algorithm. The
status of the tree T now becomes Table 8.
Similarly, in the next three steps we obtain the next
three edges {0,4}, {0,1} and {1,3}. The status of the tree
T now becomes Table 9.
Look at the Table 9. The next node to be added is 6
which is determined on line 7 of our decoding algorithm.
We do not scan the degree array d from the beginning.
We scan the degree array d from index (5) to right and
the index is moved to 6. This is a key point to see the
decoding algorithm running in linear time.
The other node label of the edge to be added is
c[5]=3. The edge {3,6} is added and d[3] is decreased by
1 on line 5 of the decoding algorithm. The status of the
tree T now becomes Table 10.
Follows the decoding algorithm on line 6 the next
node to be added is 3. c[6]=7 is the other node label of
the edge to be added. The edge {3,7} is added. We now
have the n-1 edges of the tree T. The decoding algorithm
terminates.
5. Conclusions
Prufer codes for labeled trees have many practical appli-
cations. The algorithms for coding and decoding Prufer
codes of a labeled tree are of interest in practical and
theoretical areas of computer science. The existing linear
time algorithms for Prufer-like codes utilized the integer
sorting algorithms. The special range of the integers to
be sorted is utilized to obtain a linear time integer sorting
algorithm. The optimal algorithms for coding and de-
coding Prufer codes presented in this paper are very
practical linear time algorithms. The techniques used in
these algorithms are of interest in their own right.
Table 9. After the edges {0, 4}, {0,1} and {1,3} are added
i
[0][1][2]3 [4] [5] 6 7
c[i] 2 4 0 1 3 3 7 -1
d[i] 1 1 1 2 1 1 1 1
index
Table 10. After the edge {3,6} is added
i
[0][1][2]3 [4] [5] [6]7
c[i] 2 4 0 1 3 3 7 -1
d[i] 1 1 1 1 1 1 1 1
index
Copyright © 2009 SciRes JSEA
An Optimal Algorithm for Prufer Codes
Copyright © 2009 SciRes JSEA
115
6. Acknowledgments
This work was supported by Natural Science Foundation
of China under Grant No. 60172017 and Natural Science
Foundation of Fujian under Grant No. A0510008.
REFERENCES
[1] S. Caminiti, I. Finocchi, and R. Petreschi, “A unified
approach to coding labeled trees,” in Proceedings of the
6th Latin American Symposium on Theoretical Infor-
matics (LATIN’04), LNCS 2976, pp. 339-348, 2004.
[2] S. Caminiti, I. Finocchi, and R. Petreschi, “On coding la-
beled trees,” To appear on Theoretical Computer Sci-
ence, 2006.
[3] H. C. Chen and Y. L. Wang, “An efficient algorithm for
generating Prufer codes from labeled trees,” Theory of
Computing Systems, Vol. 33, pp. 97–105, 2000.
[4] A. Cayley, “A theorem on trees,” Quarterly Journal of
Mathematics, Vol. 23, pp. 376–378, 1889.
[5] L. Devroye, “Non-uniform random variate generation,”
Springer-Verlag, New York, Exercise 2, pp. 666, 1986.
[6] A. Nijenhuis and H. S. Wilf, “Combinatorial algorithms
for computers and calculators,” Second Editon, Academic
Press, New York, Exercise 46, pp. 293, 1978.