Open Journal of Applied Sciences, 2012, 2, 184-187
doi:10.4236/ojapps.2012.23027 Published Online September 2012 (http://www.SciRP.org/journal/ojapps)
A New Method for Chinese Character Strokes Recognition
Yan Xu, Xiangnian Huang, Huan Chen, Huizhu Jiang
Xihua University, Chengdu, China
Email: 501625379@qq.com
Received July 6, 2012; revised August 4, 2012; accepted August 16, 2012
ABSTRACT
In this paper, the problem of stroke recognition has been studied, and the strategies and the algorithms related to the
problem are proposed or developed. Based on studying some current methods for Chinese characters strokes recog-
nition, a new method called combining trial is presented. The analysis and results of experiments showed that the
method has the advantage of high degree of steadiness.
Keywords: Chinese Character Recognition; Stroke Recognition; Stroke Combining
1. Introduction
Unconstrained handwritten Chinese character recognition
has been a difficult research area of character recognition,
and its stroke recognition is an important part of struc-
tural analysis of Chinese characters. The stroke is con-
stituted by a number of direction strokes(referred to as
strokes). Studies have shown that despite the everchang-
ing of unconstrained handwritten Chinese character,
stroke is very stable. Strokes contained in stroke belong
to the same section of a natural stroke (a written), and
generally, they are not separated in two or more natural
strokes, and this paper uses this feature to complete
stroke recognition. Although the definition of stroke var-
ied, a common feature of stroke is that it can be divided
into two categories of the basic stroke and composite
stroke. Basic stroke refers to the direction strokes, this
segment of space partition is shown in Figure 1. Strokes
are easy to determine and extract algorithm, and this arti-
cle discusses the complex stroke, posed by multiple
strokes [1]. For the Chinese character recognition system
based on structural analysis, effects of stroke recognition
have a major impact on the whole word recognition.
Those more typical stroke recognition methods have di-
rect synthesis method, dynamic programming method
and so on. Based on the analysis of these methods, this
paper presents a new method of Chinese characters
called combining trial [2].
2. Direct Synthesis
Direct synthesis method of stroke, which uses the ex-
tracted HSPN four direction strokes, directly synthesize
strokes according to the definition of given stroke. Algo-
rithm idea: set some determine conditions in the stroke
identification module ,combine the extracted strokes se-
rial number string in the same natural stroke, then com-
pose them to different types of strokes according to pre-
conditions to complete the stroke recognition. The big-
gest feature of this algorithm is simple and quick. Con-
sidering a greater deformation of unconstrained hand-
written Chinese character, the algorithm is generally
suitable for the occasion to write a more standardized,
the high limit handwritten of more restrictive writings
and printed texts.
3. Dynamic Programming
This algorithm is proposed to overcome adverse effects
of the deformation of the stroke to stroke judgment. It
uses a similar matching strategy: first, define a standard
strokes code string (standard stroke) and the distance cal-
culation rules between strings; then, take matching op-
erations between the unknown code string and all of the
defined standard code strings one by one, calculate the
distance between them ;finally, take the smallest as
matching result.
Invalid zone
Zone H
Zone P
Zone S
Zone N
Figure 1. Space division of strokes types.
Copyright © 2012 SciRes. OJAppS
Y. XU ET AL. 185
Algorithm for real: Through the gradual increasing or
reducing the length of a code string such as X, search the
code string in the standard code string set which has the
minimum distance with the unknown code string as the
results, and transform the stroke matching problem into
solving the best matching problem between two symbol
strings.
There are a lot of attached strokes through the free
writing, and strokes string to be identified is not known
in advance, then we need to take stroke splitting, and
each splitting produces a series of strokes to be knowl-
edge, so this method has multi-step iterative and volume
computation. We recognize that: 1) Long stroke is more
steady than short stroke; 2) Different types of strokes
have different possibility of confusion. Focusing on these
two differences, this paper has proposed to take “fuzzy
numbers of length” and “fuzzy numbers of direction”
treatment strategy. See the following section.
4. Combining Trial
Because there are a large number of connected strokes
that is multiple strokes connected with unconstrained
handwritten Chinese characters (belong to the same
natural strokes), and we can not ask the writer to separate
a painting to each stroke, stroke recognition includes two
processes which are stroke splitting and stroke judgment,
requiring to make a split decision from the entire word
which comes from an unknown mode and then make a
decision. These two processes complement each other,
alternately. So, we propose the combining trial to recog-
nize stroke.
The recognition process of combining trial is a tenta-
tive process of splitting and judgment constantly:
SplitJudgeSplit againJudge again …
Input: the direction strokes code sequence of a natural
stroke (string);
Output: the stroke code sequence of the natural stroke
(string);
Rule number one: Principle of giving priority to take
large;
Rule number two: Folded strokes determine the seg-
ments.
4.1. Strategy of Strokes Splitting-Increasing Test
For strokes codes sequence of a natural stroke (which
may contain multiple strokes), from the first strokes
codes, take a strokes code each time increasing to form a
test stroke code, take the dictionary matching and save
the matching results temporarily; then take the next
strokes code to form a new test stroke code for matching
operations, and so on. If you have extracted a predeter-
mined one of a few more strokes off, that is, the extrac-
tion of strokes contained in all segments is completed,
the stroke has been identified. If the strokes code of
natural stroke code does not end, then clear the data
structures which placed test stroke code (delete the test
stroke code),then take the next strokes code to re-formed
a test stroke code, and to determine the next stroke (the
natural stroke with multiple strokes), Until all of the
strokes code of natural stroke have been taken into the
match, then take the stroke which contains the largest
number of strokes as the match result and return it
(whichever is greater priority), to give priority to extract
folded stroke.
4.2. Strategy of Stroke Judgment-Similar
Matching
Precise matching technology requires that signature code
string to be identified must be equal to the stored signa-
ture code string in the feature dictionary. So, the refer-
ence template of the pattern in the dictionary must have
equivalent coverage, i.e., comprehensive, can cover the
most common deformation of the pattern. Obviously, the
high matching accuracy and the fast determination speed
are obvious advantages of the technology, and recogni-
tion performance mainly depends on the completeness of
the reference template. Because mode deformation range
can not be limited, it may have rejection (the code is not
in the library). In view of this, the establishment of a
more complete feature library is an important task [3,4].
Similarity matching, it is raised by the unpredictable
issues of deformation, it is usually take “distance” or
“similar degree” of model to be knowledge and reference
model as model criterion, and the definition of these cri-
teria varied. The system uses the strategy of accurate
identification first and then similar identification, that is,
when exact match Produces rejection, it is transferred to
produce similar matching module, to add similar identi-
fication of strokes of rejection. The string similarity,
matching techniques are varied, more typical of them are
dynamic programming method, fuzzy property law, the
error correction method and various weighted matching
method. When the stroke code uses non-equal length
strokes code string, in order to reflect these differences of
strokes in length and in type, this paper proposes fuzzy
numbers of length and fuzzy numbers of direction to deal
with them.
Take several written of folded stroke “”as an exam-
ple, Figure 2(a) as the standard wording, standard stroke
code is Gaca
and others are stroke variant. The
strokes code set
,,,abcd is the 4 yards direction code
of the strokes. These graphics are similar but clearly
different. The difference is that the String to be identified
has produced a distortion, and we called those strokes
code that out of standard strokes or produced a distortion
as deformed strokes. If you take exact matching and
Copyright © 2012 SciRes. OJAppS
Y. XU ET AL.
186
stroke variants are not covered by the dictionary, the
recognition algorithm will refuse to identify.
When taking similar code string matching, we con-
sider the differences of ranging symbol in length and
direction, and we take fuzzy number of the length and
direction to define the symbol distance (the distance of
strokes code):
  
0,
,,
ki
ki DL
kkk
xg
dxg
i
f
xf xxg
(1)
where: k
x
X denotes the first K symbol of strokes
code string to be identified. i
g
G denotes the first i
s-ymbol of standard strokes code string associated
with k
x
.
Dk
f
x  denotes the fuzzy number of direc-
tion associated with k
symbol.

Lk
f
x denotes the
fuzzy number of le-ngth associated with k
symbol. the
code string distance (stroke distance) is defined as the
sum of distance of each symbol associated with X string:

1
,
m
ki
k
DXGdx g
,
(2)
where,
X
denotes the stroke to be knowledge;
denotes standard stroke; denotes the total number of
strokes has been matched in
G
m
X
(some deformed strokes
may be deleted);
For stroke matching, first, computing the matching
distance of stroke to be matched and the standard stroke
according to (1), and then take the standard stroke with
minimum distance as the judgment of stroke
X
to be
knowledge. Can be seen, the main task is to calculate the
fuzzy number of direction and the fuzzy
number of length , as follows.

Dk
fx

Lk
fx

Lk
fx
1) Calculate :
Observation and analysis indicate that excess strokes
usually shorter than the previous one and (or) the after
one. So the fuzzy number calculation of length of the
f-irst K strokesk
x
in X can be defined:
(a) Standard “ (b) Deformed “” 1
(c) Deformed “” 2 (d) Deformed “” 3 (e) Deformed “” 4
a
c
a
a
d
c
d
a
c
a
d
c
d
a
d
ad
c
a
ad
c
d
Figure 2. Standard and deformed writings of folded strokes
”.
 
 

11
1min ,
k
Lk
kk
Len x
fx Len xLen x

 (3)
where,
0, 1
Lk
fx; If , take

0
Lk
fx
=0
Lk
fx ;
k
Len x denotes point length of strokes
k
.
2) Calculate
D
fx
k
The distance of two different strokes is determined by
the degree of similarity of them, so the fuzzy number
calculation of direction of the first K strokes
:
k
in X
can be defined:

1,
Dk
fx simxg
ki
(4)
where,
,
ki
s
im xg denotes the similarity of strokes xk
and standard strokes i
. It can be determined by query-
ing the strokes similar table (see strokes space subdivi-
sion plans):
P
N
H
S
P
Sim N
F
F






PN
H
SPNF F
 
(5)
Here are two strokes (stroke code string) matching al-
gorithms and matching rules.
When taking similar match for code string, we need
first select matching starting-point of string, i.e., for code
string to be knowledge, we from which symbol to start to
match. Based on analysis of the discipline of shape varia-
tion, to consider the amount of computational algorithm
as small as possible, we design two algorithms of start-
ing-point selection method: methods of fixed starting
point and floating starting-point. Fixed starting-point
method, that is, no matter the first symbol code string to
be identified and the first standard symbol code string are
the same, we take it as the starting point to match. The
method has no problem in match Figures 2(a) and (b)
aca” (standard code string) and “adcda” (code string to
be identified) for the first symbol of this two code string
are the same symbol “a”. However, when matching code
string (c) “cadcdad” and standard code string (a) “aca”,
there are dislocation of code string, which leads to
matching errors. so ,we need to design floating-point fast
matching method: First, compare the first symbol, if they
are equal, we chose it as a starting point, or we need cal-
culate the matching distances of the first symbol, second
symbol with the standard string, taking as a starting point
from the smaller one(to ensure the string long enough).
Starting point is chosen, then we need take string symbol
to be knowledge one by one to match with the standard
Copyright © 2012 SciRes. OJAppS
Y. XU ET AL.
Copyright © 2012 SciRes. OJAppS
187
symbol. If the two symbols are the same, continue to the
next symbol match, or calculate the matching distances
of the symbol, the next symbol with the standard string,
taking as a matching result from the smaller one. Finally,
we need calculate the matching distance of the two
strokes, and a similar matching of stroke is completed
[5].
5. Conclusion
For changes of unconstrained handwritten Chinese char-
acter, this paper presents a new Chinese character strokes
recognition method: combining trial which is based on
the structural analysis technology. The algorithm uses
increase test technology to splitting and combining strokes,
and the algorithm combines the proposed similarity
matching technique based on the fuzzy number of length
and the direction fuzzy number to complete stroke iden-
tification. The algorithm contains two processes of stroke
splitting and stroke judgment, and this two processes are
conducted alternately, i.e., we take split and composition
as well as judgment, and in accordance with the principle
of priority to take a large to extract the maximum possi-
ble compound stroke. Finally, we collect the first Chi-
nese characters of nation standard written by 500 people
and take tests. The results prove the effectiveness of the
algorithm and the accuracy of stroke extraction, which
lay a good foundation for the development of the fol-
low-up whole word recognition algorithm.
6. Acknowledgements
This work was supported by the research fund of Si-
chu-an key laboratory of intelligent network information
processing (SGXZD1002-10) and the key laboratory of
the radio signals intelligent processing (Xihua Univer-
sity) (XZD0818-09).
REFERENCES
[1] B Jia, X. D. Tian and F. Yang, “Off-Line Handwritten
Chinese Character Recognition Based on Double Contour
Feature,” 2009 International Symposium on Intelligent
Information Systems and Applications, Qingdao, 28-30
October 2009, pp. 399-402,
[2] T. H. Su, T. W. Zhang and D. J. Guan, “Off-Line Recog-
nition of Realistic Chinese Handwriting Using Segmenta-
tion-Free Strategy,” Pattern Recognition, Vol. 42, No. 1,
2009, pp. 167-182. doi:10.1016/j.patcog.2008.05.012
[3] X. N. Huang, “An Multiple Classifiers Integrated System
of On-Line Natural Handwritten Chinese Characters Re-
cognition,” Journal of Chinese Information Processing,
Vol. 14, No. 5, 2000, pp. 37-41.
[4] X. N. Huang, “Extraction of Natural Handwritten Chinese
Character Strokes and Roots,” Journal of Chongqing Uni-
versity (Natural Science), Vol. 23, No. 5, 2000, pp. 104-
107.
[5] Y. F. Sun, Y. Chen and Y. Z. Zhang, “Symmetry-Based
Recognition Method for Similar Chinese Characters,”
Journal of Chinese Information Processing, Vol. 18, No.
2, 2004, pp. 51-57.