J. Software Engineering & Applications, 2009, 2: 316-322
doi:10.4236/jsea.2009.25041 Published Online December 2009 (http://www.SciRP.org/journal/jsea)
Copyright © 2009 SciRes JSEA
Basic Elements Knowledge Acquisition Study in
the Chinese Character Intelligent Formation
System
Mingyou LIU, Chengsen DUAN, Youguo PI
South China University of Technology, Guangzhou, China.
Email: liumingyou1025@gmail.com
Received July 27th, 2009; revised August 22nd, 2009; accepted September 7th, 2009.
ABSTRACT
In the Chinese character intelligent formation system without Chinese character library, it is possible that the same
basic element in different Chinese characters is different in position, size and shape. The geometry transformation from
basic elements to the components of Chinese characters can be realized by affine transformation, the transformation
knowledge acquisition is the premise of Chinese character intelligent formation. A novel algorithm is proposed to ac-
quire the affine transformation knowledge of basic elements automatically in this paper. The interested region of Chi-
nese character image is determined by the structure of the Chinese character. Scale invariant and location invariant of
basic element and Chinese character image are extracted with SIFT features, the matching points of the two images are
determined according to the principle of Minimum Euclidean distance of eigenvectors. Using corner points as identifi-
cation features, calculating the one-way Hausdorff distance between corner points as the similarity measurement from
the affine image to the Chinese character sub-image, affine coefficients are determined by optimal similarity. 70244
Chinese characters in National Standards GB18030-2005 character set are taken as the experimental object, all the
characters are performed and the experimental courses and results are presented in this paper.
Keywords: Chinese Character Intelligent Formation, Knowledge Acquisition, Affine Transformation, Hausdorff Distance
1. Introduction
The concept of Chinese character intelligent information
that reuse basic elements to form Chinese characters was
proposed based on prototype authentication mechanism
of cognitive psychology in the reference [1], the author
identified that replace the Chinese character library by
Chinese character intelligent information. In his further
research, the framework of the Chinese character intelli-
gent formation system which contains basic elements
library, knowledge library and Chinese character intelli-
gent formation model [2] was proposed, and the structure
knowledge of the Chinese character is obtained by sim-
ple grid [35].
The spelling of alphabetic character has a one-dimen-
sional characteristic; it is pieced together in order under a
few rules. But the Chinese character has the two-dimen-
sional characteristic, the same basic element in different
Chinese characters is possible different in its size, shape
and position. The mapping from basic elements to the
components of the Chinese character can be realized by
affine transformation. A method that reuses radicals to
generate Chinese characters based on global affine
transformation was proposed in the reference [6]; the
affine coefficients were calculated by minimizing the
mean of the nearest-neighbor inter-point distance. But
this method was complex and it is easy to bring errors
resulted from image processing, such as outline extrac-
tion. A novel approach was proposed that selects corre-
sponding points in the boundary box of the basic ele-
ments and mapping goal respectively to calculate the
affine coefficients, then optimized by PSO algorithm, but
this method need to remove other components manually
when get the bounding box of mapping goal, so the
workload was heavy.
In this paper, the solution for acquiring basic elements
affine coefficients is as follows, the interested region of
Chinese character image is determined by the structure of
the Chinese character. Scale invariant and location in-
variant of basic element and Chinese character image are
extracted with SIFT features, matching points of the two
images are determined according to the principle of
Euclidean distance of eigenvectors. One group of affine
coefficients can be calculated from three pairs of match-
Basic Elements Knowledge Acquisition Study in the Chinese Character Intelligent Formation System
Copyright © 2009 SciRes JSEA
317
ing points which are chosen randomly. The affine image
is obtained through the transformation from the basic
element image, the corner points of affine image and
Chinese character sub-image are detected, one-way Hau-
sdorff distance between corner points is used as the rule
of measuring the similarity, and the optimal affine coef-
ficients are determined by minimum one-way Hausdorff
distance.
The contents of this paper is arranged as follows, the
principle of Chinese character intelligent information is
presented in Section 2; the geometric transformation mo-
del of the basic elements are proposed in Section 3, the
size, position and shape changes of basic elements can be
realized by affine transformation; in Section 4, a new
method of acquiring the affine coefficients of the basic
elements is proposed; the experiment study is presented
in Section 5, the generation of more than 70000 Chinese
characters is carried out after acquired the transformation
knowledge in different Chinese characters of all basic
elements. Finally, the conclusions are given in Section 6.
2. Theory of the Chinese Character
Intelligent Formation
A Chinese character is a combination of either single
hieroglyphic or self-explanatory symbol, or several of
them based on meaning and echoism rules. Any element
in the set of the character components corresponds to a
certain basic element. The components of Chinese char-
acters are the topological mapping of basic elements in
the character structure, namely, the position, size and
shape of basic elements may be different in different
Chinese characters, but basic elements are topological
invariant. For example, basic elements and
have different size, shape and position in the characters
and . As another example of this, basic element
has different size, shape and position in the charac-
ters , and . In the above-mentioned trans-
formation, the components which are corresponding to
the same basic element in different Chinese characters
are different in size, shape and position, but they have the
same topological structure, namely, they have topologi-
cal invariance.
From the above analysis, the concept of Chinese char-
acter intelligent information was proposed. Basic ele-
ments include single hieroglyphic and self-explanatory
and their symbols; a Chinese character is generated by
putting one or more basic elements together after topo-
logical mapping to the structure of the Chinese character.
Take the generation of the Chinese character ”” as an
example, as shown in Figure 1. In order to facilitate un-
derstanding, the decomposition of structures and basic
elements of the character is illustrated in Figure 2. This
character contains four level structure and five compo-
nents mapped by four basic elements. The symbols ,
, 广, are basic elements, Fi(i=1,2,3,4,5)
represents the knowledge of the topological mappings,
such as the location, size and shape of basic elements in
the Chinese character. The final Chinese character
is formed by mapping the basic elements to the corre-
sponding structure of the Chinese character.
3. Transformation Model of Basic Elements
From the analysis in Section 2, it needs a topological
transformation method which can realize to transform
basic elements to different position, size and shape in
order to realize the mapping from basic elements to
components of the Chinese character, affine transforma-
tion meets this requirement. Suppose W is a basic ele-
ments image and x is a point of the image. Let us define a
geometric transformation of the basic element image by
xx
AAAA
yy
AAAA
Wt
abab
AWtWt
Wt
cdcd


+=+=+




(1)
Figure 1. The theory of Chinese character intelligent for-
mation
Figure 2. Example of the decomposition of structures and
basic elements
Basic Elements Knowledge Acquisition Study in the Chinese Character Intelligent Formation System
Copyright © 2009 SciRes JSEA
318
With A corresponding to a linear transformation and t a
translation vector. Of which, aA and dA, bA and cA, tx and
ty represent scaling factor, rotation factor and translation
along the x, y axes respectively. The Chinese characters
are standard-type-face characters that square in a box,
which determined the geometric transformation of basic
elements involves only non-uniform scale, without rota-
tion, that is, bA = cA = 0.
4. Knowledge Acquisition of Basic Elements
The flow of acquiring the affine coefficients of basic
elements that proposed in this paper is as follows. Firstly,
the interested region of character image is determined by
the structure of the Chinese character. Secondly, the
scale and location invariant of basic element and Chinese
character image are extracted by SIFT algorithm, and the
matching points of two images are determined by nearest
Euclidean distance of eigenvectors. Of all the key mat-
ching points, a few are incorrect, it is not necessary to
remove all the pairs of mismatched points, because it can
calculate affine coefficients with three pairs of correct
matching points. Thirdly, three pairs of non-collinear po-
ints are chosen randomly to calculate one group of affine
coefficients, the basic element image is transformed to
obtain affine image, the corner points of affine and Chi-
nese character sub-image are detected respectively, and
the one-way Hausdorff distance of corner points from
affine image to Chinese character sub-image is calculated.
Finally, the optimal affine coefficients are determined
according to the minimum one-way Hausdorff distance
after limited iterations.
4.1 Region of Interest
The combinational relation of location among the com-
ponents of the Chinese character is considered as the
structure of the Chinese character. The location of dif-
ferent components in a Chinese character is determined,
thus, a rectangular region can be fixed by the structure of
Chinese character, called region of interest, which only
contains one of the components of the Chinese character.
Through the analysis of the components combina-
tional relations of the Chinese character, the structure
types of the Chinese character are summarized. The
structure of the Chinese character can divided into sev-
eral levels, which depend on whether there is combina-
tion of basic elements, the structure without combination
of basic elements is called one-level structure, while
multiple-level structure is with a combination of basic
elements. The structure level of the Chinese characters
obeys the cognitive mechanism that is from left to right,
from top to bottom and from out to inner, for example,
the character contains four level structures, as
shown in Figure 2.
Suppose the region of the Chinese character is T(x,y):
{
0
(,) 0
xw
Txy
yh
≤≤
=
≤≤
(2)
Where w and h are the width and height of the Chinese
character image.
The code structure of the Chinese character is a kind
of hierarchical level, along the level-dividing tree, the
interested region of secondary level structure is contained
in its interested region of previous level structure. Sup-
pose the interested region of the previous level structure
and its sub-level structure can be represented with R(x,y)
and S(x,y) respectively as follows.
1212
1212
(0)
(,)
(0)
xxxx
yyyy
rxrrrw
Rxy
ryrrrh
≤≤
=
≤≤
(3)
1212
1212
(0)
(,)
(0)
xxxx
yyyy
sxsssw
Sxy
sysssh
≤≤
=
≤≤
(4)
From the above analysis, it can be obtained:
(,)(,)
SxyRxy
, 1122
0
xxxx
rssrw
≤≤
,
1122
0
yyyy
rssrh
≤≤
.
In order to calculate the interested region of each basic
element conveniently, the relation between the interested
region of the previous level structure and its sub-level
structure can be defined as follows.
1211
1212
(01)
(,)
(01)
a
ab b
αααα
ββββ
≤≤
=
≤≤
h (5)
Where
1
α
and
2
α
,
1
β
and
2
β
represent the perce-
ntages of the interested region of the sub-level structure
holding that of its previous level structure in X and Y
axis respectively, which can be set by actual situation.
Through the analysis above, the interested region of
sub-structure comparing with its previous structure can
be calculated as follows.
11211221
11211221
()()
(,)
()()
xxxxxx
yyyyyy
rrrxrrr
Sxy
rrryrrr
αα
ββ
+×+×−
=
+×+×−
(6)
The interested region of each basic element in the Chi-
nese character can be calculated along the level-dividing
tree of Chinese characters code structure with the meth-
od of recursive under Equation (6).
4.2 Extract Scale and Location Invariant with
SIFT
The SIFT [7,8] features are local and based on the ap-
pearance of the object at particular interest points, and
are invariant to image scale and rotation.
According the characteristic of the Chinese character,
this article improved the SIFT that remove the rotation
invariant and extract scale and location invariant of SIFT
features. The main steps are included as follows.
Basic Elements Knowledge Acquisition Study in the Chinese Character Intelligent Formation System
Copyright © 2009 SciRes JSEA
319
For this, the image is convolved with Gaussian filters
at different scales, and then the difference of successive
Gaussian-blurred images are taken, Key points are then
taken as maxima/minima of the Difference of Gaussians
that occur at multiple scales. Specifically, a DoG image
(,,)
Gxy
σ
is given by
(,,)((,,)(,,))(,)
(,,)(,,)
DxyGxykGxyIxy
LxykLxy
σσσ
σσ
=−∗
=− (7)
Where
(,,)
Lxyk
σ
is the original image
(,)
Ixy
con-
volved with the Gaussian blur
(,,)
Gxyk
σ
at scale
k
σ
(,,)(,)(,)
LxyGxyIxy
σσ=∗ (8)
This is done by comparing each pixel in the DoG im-
ages to its eight neighbors at the same scale and nine
corresponding neighboring pixels in each of the neigh-
boring scales. If the pixel value is the maximum or
minimum among all compared pixels, it is selected as a
candidate key point.
Scale-space extreme detection produces too many key
point candidates, some of which are unstable. The next
step in the algorithm is to perform a detailed fit to the
nearby data for accurate location, scale, and ratio of prin-
cipal curvatures. This information allows points to be
rejected that have low contrast (and are therefore sensi-
tive to noise) or are poorly localized along an edge.
The Gaussian-smoothed image
(,,)
Lxy
σ
at the key
points scale σ is taken so that all computations are per-
formed in a scale-invariant manner. For an image sample
(,)
Lxy
at scale σ, the gradient magnitude,
(,)
mxy
,
and orientation,
(,)
xy
θ, are computed using pixel dif-
ferences:
22
(,)
((1,)(1,))((,1)(,1))
mxy
LxyLxyLxyLxy
=
+++−−
(9)
(,)arctan(((,1)(,1))
/((1,)(1,)))
xyLxyLxy
LxyLxy
θ
=+−−
+−− (10)
Previous steps found key point locations at particular
scales and assigned orientations to them. This ensured
invariance to image location, scale. Now we want to
compute descriptor vectors for these key points such that
the descriptors are highly distinctive and partially in-
variant to the remaining variations, like illumination, 3D
viewpoint, etc. This step is pretty similar to the Orienta-
tion Assignment step. The feature descriptor is computed
as a set of orientation histograms on (4 x 4) pixel
neighborhoods. Just like before, the contribution of each
pixel is weighted by the gradient magnitude, and by a
Gaussian with σ 1.5 times the scale of the key point.
Histograms contain 8 bins each, and each descriptor
contains a 4x4 array of 16 histograms around the key
point. This leads to a SIFT feature vector with (4 x 4 x 8
= 128 elements). This vector is normalized to enhance
invariance to changes in illumination.
4.3 Key Points Match
When the key point descriptor of two images was built,
take the Euclidean distance between the descriptors of
each feature point as the measurement of the similarity.
Suppose the sets of descriptor are,
()()()
12
{,,}
aaa
a
Na
Ffff=
L
,
()()()
12
{,,}
bbb
b
Nb
Ffff=L respectively in images A and B,
where Na and Nb are the numbers of key points in the
two images respectively, the Euclidean distance d be-
tween any descriptor
()
a
i
f
in image A to any descrip-
tor
()
b
j
f
in image B is presented as below,
1
()()()()2
0
(,)([][])
D
abab
ijij
d
dfffdfd
=
=−
(11)
Take one of the key points in image A, and find the
key point with nearest Euclidean distance in image B. In
the case of the ratio of nearest Euclidean distance and the
second nearest Euclidean distance, it is hard to choose a
threshold, if the threshold value is too small, the correct
matching points pairs may get lost. So, this method
avoids losing correct matching points pairs.
4.4 Hausdorff Distance
In this paper, the corner points is taken as the recognition
clue, the similarity measurement between affine and
character sub-image is realized by using one-way Haus-
dorff distance. Firstly, the boundary box of affine image
is obtained, a region in the character sub-image is deter-
mined corresponding to the boundary box and is ex-
tracted. Secondly, the corner points are detected in affine
image and character sub-image. Common corner points
detection algorithm are mainly Harris, Susan, etc, this
paper adopts the algorithm that proposed by Shi.J and
Tomasi.C [9].
Assumed two finite sets S={s1,s2,,sp} and T={ t
1
,
t2,,tq }, the Hausdorff distance [10] is given as,
H(S,)max(h(,),(,))
TSThTS
= (12)
Where (,)maxmin
tT
sS
hSTst
=−
,
(,)maxmin
sS
tT
hTSts
=−
Where
represents a distance norm between S and T,
and Euclidean distance is used in this paper.
The Hausdorff distance is a measurement which de-
scribes the degree of similarity of two sets. h(S,T) de-
scribes the similarity from S to T, and h(T,S) exactly the
opposite.
Basic Elements Knowledge Acquisition Study in the Chinese Character Intelligent Formation System
Copyright © 2009 SciRes JSEA
320
The Hausdorff distance is vulnerable to the noise. Due
to the existence of disturbance in character sub-image,
the one-way Hausdorff distance means the distance from
affine image to character sub-image, which describes the
similarity from affine image to character sub-image.
5. Experiment and Result Analysis
70244 Chinese characters in National Standards GB18030-
2005 character set are taken as the experimental object.
Through splitting each character, summing up all basic
elements, some of the basic elements are illustrated in
Figure 4. Basic elements and Chinese characters are
chosen in 24 pounds Song Ti font and are manufactured
in gray images with the size of 32*32 pixels.
Experiment is implemented following the flow per-
sented in Figure 3. Basic elements and Chinese charac-
ters are manufactured with the size of 128*128 pixels
during the acquisition of the affine coefficients. A dem-
onstration of acquiring the affine coefficients that trans-
forms the basic element image to the Chinese char-
acter image , is shown in Figure 5.
SIFT algorithm can produce abundant information of
features. There are some mismatched point pairs which
are determined by the nearest distance of eigenvectors.
One-way Hausdorff distance is used as the similarity
measurement between affine image and character sub-
image in this paper, and three pairs of correct matching
points can be identified effectively.
Basic element
image Chinese character
image
Extract scale and location invariant with SIFT
Region of interest
Key points match
Calculate one group of affine coefficients by
choosing three pairs of matching points randomly
Calculate the one-way Hausdorff distance
between corner points
If D<D*, D*=D, then T*=T
Output optimal coefficients T*
End iteration
N
Y
Transform basic element image with affine
transformation
Detect corner points of basic element image and
affine image
Figure 3. The flow of acquiring affine coefficients
Figure 4. Some of the basic elements
Current Distortion Evaluation in Traction 4Q Constant Switching Frequency Converters
Copyright © 2009 SciRes JSEA
321
(a) (b)
(c) (d)
(e)
Figure 5. Example of acquiring affine coefficients. (a) is the
basic element image that adds the SIFT features, (b) is
Chinese character image that adds the SIFT features, the
rectangular box for the interested region, (c) marks one of
the affine images corner points, the rectangular box for the
boundary box of affine image, (d) marks the corner points
of character sub-image that is corresponding to the rectan-
gular box of (c), (e) is the images that find the optimal three
pairs of matching points according to minimum one-way
Hausdorff distance
Figure 6. Some of the formed characters
After obtaining the mapping knowledge of all the ba-
sic elements, the affine transformations are performed on
the basic elements which composed the Chinese charac-
ters, then the characters are generated by bitwise AND of
the images after the transformation, some of the formed
characters are given in Figure 6.
6. Conclusions
In this paper, considering the characteristics of the Chi-
nese character, through the extraction of SIFT features,
the matching point pairs are determined under the princi-
ple of one-way Hausdorff distance, and the affine coeffi-
cients are acquired by solving the linear equations. 70244
Chinese characters in the National Standards GB18030-
2005 character set are studied; all the characters are
formed successfully. The experimental results demon-
strate that the performance of forming characters by ba-
sic elements is feasible; the transformation from basic
elements to Chinese characters can be realized by affine
transformation, and the algorithm that acquires the affine
coefficients of basic elements proposed in this paper is
efficient. Although the Chinese character study is limited
on the size and font of Song Ti in this paper, the pro-
posed method can be extended to other different sizes
and fonts.
7. Acknowledgements
The author is grateful for the helpful discussion with Jian
Huang and Wenzhi Liao.
REFERENCES
[1] Y. G. Pi, The formation of chinese character in
computerization, CN1558314, China, 2004.
[2] Y. G. Pi, H. L. Shu and T. C. Liang, The frame of
cognitive pattern recognition, In Proceedings of the 26th
Chinese Control Conference, Beijing, pp. 694696, 2007.
[3] T. C. Liang, Z. W. Qiu and Y. G. Pi, Simple grid based
on cognitive mechanism and application research on
description for structure of Chinese character, In
Proceedings of the 26th Chinese Control Conference,
BeiJing, pp, 689693, 2007.
[4] T. C. Liang and Y. G. Pi, On Chinese Character
Structure(CCS) recognition based on bayesian learning,
In Proceedings of the 2007 International Conference on
Intelligent Systems and Knowledge Engineering,
ChengDu, pp. 10321037, 2007.
[5] Y. G. Pi and Z. B. Mou, The grid and its description of
Chinese character in computer, CN1558339, China,
2004.
[6] L. W. Jin and X. N. Zu, Synthesis of Chinese character
using affine transformation, In Proceedings of Ninth
International Conference on Document Analysis and
Recognition, pp. 218222, 2007.
Basic Elements Knowledge Acquisition Study in the Chinese Character Intelligent Formation System
Copyright © 2009 SciRes JSEA
322
[7] D. G. Lowe, Object recognition from local scale-
invariant features, In Proceedings of international
conference on computer vision, Corfu, Greece, pp.
11501157, 1999.
[8] D. G. Lowe, Distinctive image features from scale-
invariant key-points, International Journal of Computer
Vision, Vol. 60,No. 2, pp. 91110, 2004.
[9] J. B. Shi and C. Tomasi, Good featrues to track, IEEE
Conference on Computer Visoion and Pattern
Recognition, Seattle, pp. 593600, 1994.
[10] B. F. Guo, K. M. Lan and K. H. Lin, Human face
recognition based on spatially weighted Hausdorff
distance, Pattern Recognition Letters, Vol. 24, No. 1-3,
pp. 499507, 2003.