Basic Elements Knowledge Acquisition Study in the Chinese Character Intelligent Formation System

doi:10.4236/jsea.2009.25041

Paper Menu >>

Journal Menu >>

J. Software Engineering & Applications, 2009, 2: 316-322

doi:10.4236/jsea.2009.25041 Published Online December 2009 (http://www.SciRP.org/journal/jsea)

Basic Elements Knowledge Acquisition Study in

the Chinese Character Intelligent Formation

System

Mingyou LIU, Chengsen DUAN, Youguo PI

South China University of Technology, Guangzhou, China.

Email: liumingyou1025@gmail.com

Received July 27th, 2009; revised August 22nd, 2009; accepted September 7th, 2009.

ABSTRACT

In the Chinese character intelligent formation system without Chinese character library, it is possible that the same

basic element in different Chinese characters is different in position, size and shape. The geometry transformation from

basic elements to the components of Chinese characters can be realized by affine transformation, the transformation

knowledge acquisition is the premise of Chinese character intelligent formation. A novel algorithm is proposed to ac-

quire the affine transformation knowledge of basic elements automatically in this paper. The interested region of Chi-

nese character image is determined by the structure of the Chinese character. Scale invariant and location invariant of

basic element and Chinese character image are extracted with SIFT features, the matching points of the two images are

determined according to the principle of Minimum Euclidean distance of eigenvectors. Using corner points as identifi-

cation features, calculating the one-way Hausdorff distance between corner points as the similarity measurement from

the affine image to the Chinese character sub-image, affine coefficients are determined by optimal similarity. 70244

Chinese characters in National Standards GB18030-2005 character set are taken as the experimental object, all the

characters are performed and the experimental courses and results are presented in this paper.

Keywords: Chinese Character Intelligent Formation, Knowledge Acquisition, Affine Transformation, Hausdorff Distance

1. Introduction

The concept of Chinese character intelligent information

that reuse basic elements to form Chinese characters was

proposed based on prototype authentication mechanism

of cognitive psychology in the reference [1], the author

identified that replace the Chinese character library by

Chinese character intelligent information. In his further

research, the framework of the Chinese character intelli-

gent formation system which contains basic elements

library, knowledge library and Chinese character intelli-

gent formation model [2] was proposed, and the structure

knowledge of the Chinese character is obtained by sim-

ple grid [3–5].

The spelling of alphabetic character has a one-dimen-

sional characteristic; it is pieced together in order under a

few rules. But the Chinese character has the two-dimen-

sional characteristic, the same basic element in different

Chinese characters is possible different in its size, shape

and position. The mapping from basic elements to the

components of the Chinese character can be realized by

affine transformation. A method that reuses radicals to

generate Chinese characters based on global affine

transformation was proposed in the reference [6]; the

affine coefficients were calculated by minimizing the

mean of the nearest-neighbor inter-point distance. But

this method was complex and it is easy to bring errors

resulted from image processing, such as outline extrac-

tion. A novel approach was proposed that selects corre-

sponding points in the boundary box of the basic ele-

ments and mapping goal respectively to calculate the

affine coefficients, then optimized by PSO algorithm, but

this method need to remove other components manually

when get the bounding box of mapping goal, so the

workload was heavy.

In this paper, the solution for acquiring basic elements’

affine coefficients is as follows, the interested region of

Chinese character image is determined by the structure of

the Chinese character. Scale invariant and location in-

variant of basic element and Chinese character image are

extracted with SIFT features, matching points of the two

images are determined according to the principle of

Euclidean distance of eigenvectors. One group of affine

coefficients can be calculated from three pairs of match-

Basic Elements Knowledge Acquisition Study in the Chinese Character Intelligent Formation System

317

ing points which are chosen randomly. The affine image

is obtained through the transformation from the basic

element image, the corner points of affine image and

Chinese character sub-image are detected, one-way Hau-

sdorff distance between corner points is used as the rule

of measuring the similarity, and the optimal affine coef-

ficients are determined by minimum one-way Hausdorff

distance.

The contents of this paper is arranged as follows, the

principle of Chinese character intelligent information is

presented in Section 2; the geometric transformation mo-

del of the basic elements are proposed in Section 3, the

size, position and shape changes of basic elements can be

realized by affine transformation; in Section 4, a new

method of acquiring the affine coefficients of the basic

elements is proposed; the experiment study is presented

in Section 5, the generation of more than 70000 Chinese

characters is carried out after acquired the transformation

knowledge in different Chinese characters of all basic

elements. Finally, the conclusions are given in Section 6.

2. Theory of the Chinese Character

Intelligent Formation

A Chinese character is a combination of either single

hieroglyphic or self-explanatory symbol, or several of

them based on meaning and echoism rules. Any element

in the set of the character components corresponds to a

certain basic element. The components of Chinese char-

acters are the topological mapping of basic elements in

the character structure, namely, the position, size and

shape of basic elements may be different in different

Chinese characters, but basic elements are topological

invariant. For example, basic elements “氵” and “少”

have different size, shape and position in the characters

“沙” and “莎”. As another example of this, basic element

“寸” has different size, shape and position in the charac-

ters “讨”, “辱” and “褥”. In the above-mentioned trans-

formation, the components which are corresponding to

the same basic element in different Chinese characters

are different in size, shape and position, but they have the

same topological structure, namely, they have topologi-

cal invariance.

From the above analysis, the concept of Chinese char-

acter intelligent information was proposed. Basic ele-

ments include single hieroglyphic and self-explanatory

and their symbols; a Chinese character is generated by

putting one or more basic elements together after topo-

logical mapping to the structure of the Chinese character.

Take the generation of the Chinese character “蘑”” as an

example, as shown in Figure 1. In order to facilitate un-

derstanding, the decomposition of structures and basic

elements of the character is illustrated in Figure 2. This

character contains four level structure and five compo-

nents mapped by four basic elements. The symbols “艹”,

“木”, “广”, “石” are basic elements, Fi(i=1,2,3,4,5)

represents the knowledge of the topological mappings,

such as the location, size and shape of basic elements in

the Chinese character. The final Chinese character “蘑”

is formed by mapping the basic elements to the corre-

sponding structure of the Chinese character.

3. Transformation Model of Basic Elements

From the analysis in Section 2, it needs a topological

transformation method which can realize to transform

basic elements to different position, size and shape in

order to realize the mapping from basic elements to

components of the Chinese character, affine transforma-

tion meets this requirement. Suppose W is a basic ele-

ments image and x is a point of the image. Let us define a

geometric transformation of the basic element image by

AAAA

abab

AWtWt

cdcd



+=+=+





(1)

Figure 1. The theory of Chinese character intelligent for-

mation

Figure 2. Example of the decomposition of structures and

basic elements

Basic Elements Knowledge Acquisition Study in the Chinese Character Intelligent Formation System

318

With A corresponding to a linear transformation and t a

translation vector. Of which, aA and dA, bA and cA, tx and

ty represent scaling factor, rotation factor and translation

along the x, y axes respectively. The Chinese characters

are standard-type-face characters that square in a box,

which determined the geometric transformation of basic

elements involves only non-uniform scale, without rota-

tion, that is, bA = cA = 0.

4. Knowledge Acquisition of Basic Elements

The flow of acquiring the affine coefficients of basic

elements that proposed in this paper is as follows. Firstly,

the interested region of character image is determined by

the structure of the Chinese character. Secondly, the

scale and location invariant of basic element and Chinese

character image are extracted by SIFT algorithm, and the

matching points of two images are determined by nearest

Euclidean distance of eigenvectors. Of all the key mat-

ching points, a few are incorrect, it is not necessary to

remove all the pairs of mismatched points, because it can

calculate affine coefficients with three pairs of correct

matching points. Thirdly, three pairs of non-collinear po-

ints are chosen randomly to calculate one group of affine

coefficients, the basic element image is transformed to

obtain affine image, the corner points of affine and Chi-

nese character sub-image are detected respectively, and

the one-way Hausdorff distance of corner points from

affine image to Chinese character sub-image is calculated.

Finally, the optimal affine coefficients are determined

according to the minimum one-way Hausdorff distance

after limited iterations.

4.1 Region of Interest

The combinational relation of location among the com-

ponents of the Chinese character is considered as the

structure of the Chinese character. The location of dif-

ferent components in a Chinese character is determined,

thus, a rectangular region can be fixed by the structure of

Chinese character, called region of interest, which only

contains one of the components of the Chinese character.

Through the analysis of the components’ combina-

tional relations of the Chinese character, the structure

types of the Chinese character are summarized. The

structure of the Chinese character can divided into sev-

eral levels, which depend on whether there is combina-

tion of basic elements, the structure without combination

of basic elements is called one-level structure, while

multiple-level structure is with a combination of basic

elements. The structure level of the Chinese characters

obeys the cognitive mechanism that is from left to right,

from top to bottom and from out to inner, for example,

the character “蘑” contains four level structures, as

shown in Figure 2.

Suppose the region of the Chinese character is T(x,y):

{

(,) 0

Txy

≤≤

(2)

Where w and h are the width and height of the Chinese

character image.

The code structure of the Chinese character is a kind

of hierarchical level, along the level-dividing tree, the

interested region of secondary level structure is contained

in its interested region of previous level structure. Sup-

pose the interested region of the previous level structure

and its sub-level structure can be represented with R(x,y)

and S(x,y) respectively as follows.

1212

(0)

(,)

(0)

xxxx

yyyy

rxrrrw

Rxy

ryrrrh

≤≤≤≤≤





≤≤≤≤≤

 (3)

1212

(0)

(,)

(0)

xxxx

yyyy

sxsssw

Sxy

sysssh

≤≤≤≤≤





≤≤≤≤≤

 (4)

From the above analysis, it can be obtained:

(,)(,)

SxyRxy

⊆, 1122

xxxx

rssrw

≤≤≤≤≤

1122

yyyy

rssrh

≤≤≤≤≤

In order to calculate the interested region of each basic

element conveniently, the relation between the interested

region of the previous level structure and its sub-level

structure can be defined as follows.

1211

1212

(01)

(,)

(01)

ab b

αααα

ββββ

≤≤≤≤≤



=

≤≤≤≤≤



h (5)

Where

and

represent the perce-

ntages of the interested region of the sub-level structure

holding that of it’s previous level structure in X and Y

axis respectively, which can be set by actual situation.

Through the analysis above, the interested region of

sub-structure comparing with its previous structure can

be calculated as follows.

11211221

()()

(,)

()()

xxxxxx

yyyyyy

rrrxrrr

Sxy

rrryrrr

αα

ββ

+×−≤≤+×−





+×−≤≤+×−



(6)

The interested region of each basic element in the Chi-

nese character can be calculated along the level-dividing

tree of Chinese characters’ code structure with the meth-

od of recursive under Equation (6).

4.2 Extract Scale and Location Invariant with

SIFT

The SIFT [7,8] features are local and based on the ap-

pearance of the object at particular interest points, and

are invariant to image scale and rotation.

According the characteristic of the Chinese character,

this article improved the SIFT that remove the rotation

invariant and extract scale and location invariant of SIFT

features. The main steps are included as follows.

Basic Elements Knowledge Acquisition Study in the Chinese Character Intelligent Formation System

319

For this, the image is convolved with Gaussian filters

at different scales, and then the difference of successive

Gaussian-blurred images are taken, Key points are then

taken as maxima/minima of the Difference of Gaussians

that occur at multiple scales. Specifically, a DoG image

(,,)

Gxy

is given by

(,,)((,,)(,,))(,)

(,,)(,,)

DxyGxykGxyIxy

LxykLxy

σσσ

σσ

=−∗

=− (7)

Where

(,,)

Lxyk

is the original image

(,)

Ixy

con-

volved with the Gaussian blur

(,,)

Gxyk

at scale

(,,)(,)(,)

LxyGxyIxy

σσ=∗ (8)

This is done by comparing each pixel in the DoG im-

ages to its eight neighbors at the same scale and nine

corresponding neighboring pixels in each of the neigh-

boring scales. If the pixel value is the maximum or

minimum among all compared pixels, it is selected as a

candidate key point.

Scale-space extreme detection produces too many key

point candidates, some of which are unstable. The next

step in the algorithm is to perform a detailed fit to the

nearby data for accurate location, scale, and ratio of prin-

cipal curvatures. This information allows points to be

rejected that have low contrast (and are therefore sensi-

tive to noise) or are poorly localized along an edge.

The Gaussian-smoothed image

(,,)

Lxy

at the key

point’s scale σ is taken so that all computations are per-

formed in a scale-invariant manner. For an image sample

(,)

Lxy

at scale σ, the gradient magnitude,

(,)

mxy

and orientation,

(,)

θ, are computed using pixel dif-

ferences:

(,)

((1,)(1,))((,1)(,1))

mxy

LxyLxyLxyLxy

+−−++−−

(9)

(,)arctan(((,1)(,1))

/((1,)(1,)))

xyLxyLxy

LxyLxy

=+−−

+−− (10)

Previous steps found key point locations at particular

scales and assigned orientations to them. This ensured

invariance to image location, scale. Now we want to

compute descriptor vectors for these key points such that

the descriptors are highly distinctive and partially in-

variant to the remaining variations, like illumination, 3D

viewpoint, etc. This step is pretty similar to the Orienta-

tion Assignment step. The feature descriptor is computed

as a set of orientation histograms on (4 x 4) pixel

neighborhoods. Just like before, the contribution of each

pixel is weighted by the gradient magnitude, and by a

Gaussian with σ 1.5 times the scale of the key point.

Histograms contain 8 bins each, and each descriptor

contains a 4x4 array of 16 histograms around the key

point. This leads to a SIFT feature vector with (4 x 4 x 8

= 128 elements). This vector is normalized to enhance

invariance to changes in illumination.

4.3 Key Points Match

When the key point descriptor of two images was built,

take the Euclidean distance between the descriptors of

each feature point as the measurement of the similarity.

Suppose the sets of descriptor are,

()()()

{,,}

aaa

Ffff=

()()()

{,,}

bbb

Ffff=L respectively in images A and B,

where Na and Nb are the numbers of key points in the

two images respectively, the Euclidean distance d be-

tween any descriptor

()

in image A to any descrip-

tor

()

in image B is presented as below,

()()()()2

(,)([][])

abab

ijij

dfffdfd

−

=−

∑ (11)

Take one of the key points in image A, and find the

key point with nearest Euclidean distance in image B. In

the case of the ratio of nearest Euclidean distance and the

second nearest Euclidean distance, it is hard to choose a

threshold, if the threshold value is too small, the correct

matching points pairs may get lost. So, this method

avoids losing correct matching points pairs.

4.4 Hausdorff Distance

In this paper, the corner points is taken as the recognition

clue, the similarity measurement between affine and

character sub-image is realized by using one-way Haus-

dorff distance. Firstly, the boundary box of affine image

is obtained, a region in the character sub-image is deter-

mined corresponding to the boundary box and is ex-

tracted. Secondly, the corner points are detected in affine

image and character sub-image. Common corner points

detection algorithm are mainly Harris, Susan, etc, this

paper adopts the algorithm that proposed by Shi.J and

Tomasi.C [9].

Assumed two finite sets S={s1,s2,…,sp} and T={ t

t2,…,tq }, the Hausdorff distance [10] is given as,

H(S,)max(h(,),(,))

TSThTS

= (12)

Where (,)maxmin

hSTst

∈

=−

(,)maxmin

hTSts

∈

=−

Where

•

represents a distance norm between S and T,

and Euclidean distance is used in this paper.

The Hausdorff distance is a measurement which de-

scribes the degree of similarity of two sets. h(S,T) de-

scribes the similarity from S to T, and h(T,S) exactly the

opposite.

Basic Elements Knowledge Acquisition Study in the Chinese Character Intelligent Formation System

320

The Hausdorff distance is vulnerable to the noise. Due

to the existence of disturbance in character sub-image,

the one-way Hausdorff distance means the distance from

affine image to character sub-image, which describes the

similarity from affine image to character sub-image.

5. Experiment and Result Analysis

70244 Chinese characters in National Standards GB18030-

2005 character set are taken as the experimental object.

Through splitting each character, summing up all basic

elements, some of the basic elements are illustrated in

Figure 4. Basic elements and Chinese characters are

chosen in 24 pounds Song Ti font and are manufactured

in gray images with the size of 32*32 pixels.

Experiment is implemented following the flow per-

sented in Figure 3. Basic elements and Chinese charac-

ters are manufactured with the size of 128*128 pixels

during the acquisition of the affine coefficients. A dem-

onstration of acquiring the affine coefficients that trans-

forms the basic element image “隹” to the Chinese char-

acter image “衢”, is shown in Figure 5.

SIFT algorithm can produce abundant information of

features. There are some mismatched point pairs which

are determined by the nearest distance of eigenvectors.

One-way Hausdorff distance is used as the similarity

measurement between affine image and character sub-

image in this paper, and three pairs of correct matching

points can be identified effectively.

Basic element

image Chinese character

image

Extract scale and location invariant with SIFT

Region of interest

Key points match

Calculate one group of affine coefficients by

choosing three pairs of matching points randomly

Calculate the one-way Hausdorff distance

between corner points

If D<D*, D*=D, then T*=T

Output optimal coefficients T*

End iteration？

Transform basic element image with affine

transformation

Detect corner points of basic element image and

affine image

Figure 3. The flow of acquiring affine coefficients

Figure 4. Some of the basic elements

Current Distortion Evaluation in Traction 4Q Constant Switching Frequency Converters

321

(a) (b)

(e)

Figure 5. Example of acquiring affine coefficients. (a) is the

basic element image that adds the SIFT features, (b) is

Chinese character image that adds the SIFT features, the

rectangular box for the interested region, (c) marks one of

the affine image’s corner points, the rectangular box for the

boundary box of affine image, (d) marks the corner points

of character sub-image that is corresponding to the rectan-

gular box of (c), (e) is the images that find the optimal three

pairs of matching points according to minimum one-way

Hausdorff distance

Figure 6. Some of the formed characters

After obtaining the mapping knowledge of all the ba-

sic elements, the affine transformations are performed on

the basic elements which composed the Chinese charac-

ters, then the characters are generated by bitwise AND of

the images after the transformation, some of the formed

characters are given in Figure 6.

6. Conclusions

In this paper, considering the characteristics of the Chi-

nese character, through the extraction of SIFT features,

the matching point pairs are determined under the princi-

ple of one-way Hausdorff distance, and the affine coeffi-

cients are acquired by solving the linear equations. 70244

Chinese characters in the National Standards GB18030-

2005 character set are studied; all the characters are

formed successfully. The experimental results demon-

strate that the performance of forming characters by ba-

sic elements is feasible; the transformation from basic

elements to Chinese characters can be realized by affine

transformation, and the algorithm that acquires the affine

coefficients of basic elements proposed in this paper is

efficient. Although the Chinese character study is limited

on the size and font of Song Ti in this paper, the pro-

posed method can be extended to other different sizes

and fonts.

7. Acknowledgements

The author is grateful for the helpful discussion with Jian

Huang and Wenzhi Liao.

REFERENCES

[1] Y. G. Pi, “The formation of chinese character in

computerization,” CN1558314, China, 2004.

[2] Y. G. Pi, H. L. Shu and T. C. Liang, “The frame of

cognitive pattern recognition,” In Proceedings of the 26th

Chinese Control Conference, Beijing, pp. 694–696, 2007.

[3] T. C. Liang, Z. W. Qiu and Y. G. Pi, “Simple grid based

on cognitive mechanism and application research on

description for structure of Chinese character,” In

Proceedings of the 26th Chinese Control Conference,

BeiJing, pp, 689–693, 2007.

[4] T. C. Liang and Y. G. Pi, “On Chinese Character

Structure(CCS) recognition based on bayesian learning,”

In Proceedings of the 2007 International Conference on

Intelligent Systems and Knowledge Engineering,

ChengDu, pp. 1032–1037, 2007.

[5] Y. G. Pi and Z. B. Mou, “The grid and its description of

Chinese character in computer, ” CN1558339, China,

2004.

[6] L. W. Jin and X. N. Zu, “Synthesis of Chinese character

using affine transformation,” In Proceedings of Ninth

International Conference on Document Analysis and

Recognition, pp. 218–222, 2007.

Basic Elements Knowledge Acquisition Study in the Chinese Character Intelligent Formation System

322

[7] D. G. Lowe, “Object recognition from local scale-

invariant features,” In Proceedings of international

conference on computer vision, Corfu, Greece, pp.

1150–1157, 1999.

[8] D. G. Lowe, “Distinctive image features from scale-

invariant key-points,” International Journal of Computer

Vision, Vol. 60,No. 2, pp. 91–110, 2004.

[9] J. B. Shi and C. Tomasi, “Good featrues to track,” IEEE

Conference on Computer Visoion and Pattern

Recognition, Seattle, pp. 593–600, 1994.

[10] B. F. Guo, K. M. Lan and K. H. Lin, “ Human face

recognition based on spatially weighted Hausdorff

distance,” Pattern Recognition Letters, Vol. 24, No. 1-3,

pp. 499–507, 2003.