endency of our system form any kind of form.
The forms chosen can be fulfilled either in Arabic, English or any other language, because the method we proposed is completely language independent. From each of the selected forms we collect 25 copies and asked some students to fill them in their own natural writing style. The administrative forms are an A4 paper size where the bank form size is A5. The bank form was taken from a national bank in Saudi Arabia; it is a deposit form. The administrative form is a student department complaints form.
All these forms contains numeric as well as alphabetical characters, see some samples in “Figure 3”. They have been scanned in a grayscale image color 300 dpi.
4. Experimentations and Results
The evaluation of the proposed method has been achieved in two steps:
4.1. Evaluation of the forms Identification
We presented to our system all the 50 fulfilled forms from both groups (form_U or form_B) in a random order and we counted the number of times when the system proposed the form name correctly. All the forms used have been detected correctly in 100% of the cases.
4.2. Evaluation of the Handwriting Fields Extraction
In order to evaluate this method and establish a significant performance rate that can be analyzed and discussed; we decided to proceeded by counting first the number of handwriting words, digits or numbers present in each group of forms than we counted their number in the output of the system if they have been correctly and completely extracted “Table 2”.
The performance of our method reaches 84% with the form_U when this rate increases to 92% with the form_B. some results is presented in “Figure 4”.
The results obtained by our whole system (detection and extraction) are very promising. Regarding the forms detection theses results were expected since the parameters used to distinguish the two forms don’t contain any intersection or confusion range between them. However, it seems hard to ensure that this rate will stay so high if we used more than 2 forms particularly if these forms present a high degree of similarity.
Figure 3. Samples of the fulfilled forms.
Table 2. Our system performance based on the 2 forms.
Regarding the extraction method, the results were very promising and positive since the global performance of the method for the 2 forms is around 90%. These results don’t take into account the quality of the text extracted. Indeed, some additional post-processing methods can be applied such as dilatation or noise removal in order to make these fields more exploitable by the word recognition module.
In this paper we present a handwriting extraction fields system, which can be seen as a preprocessing module
Figure 4. Some samples of the extracted handwriting fields.
in a complete handwriting recognition system. This module extracts the zone of interests (handwriting fields) in any entered form and makes these zones available to the recognition module.
We propose a free language and a free form method. Indeed, this method remove the original form structure, in other terms “subtract” from the fulfilled form the empty one.
To make this operation possible a preprocessing step was applied, we binarized all the forms using the Otsu algorithm then we removed some noise. The second step was the form identification since we evaluated our system with 2 different forms types and then an orientation step was necessary to make the matching operation possible; to this purpose we used a Fourier-Mellin transform.
We evaluated our system with 50 forms belonging to two different forms (bank form and a university form). The global performance of our method gives around 90% of good extraction.
We propose in the future, to investigate extraction of the handwriting fields from any form or document with- out any a priori about the structure of this document, which is once again more close to the reality.