endency of our system form any kind of form.

The forms chosen can be fulfilled either in Arabic, English or any other language, because the method we proposed is completely language independent. From each of the selected forms we collect 25 copies and asked some students to fill them in their own natural writing style. The administrative forms are an A4 paper size where the bank form size is A5. The bank form was taken from a national bank in Saudi Arabia; it is a deposit form. The administrative form is a student department complaints form.

All these forms contains numeric as well as alphabetical characters, see some samples in “Figure 3”. They have been scanned in a grayscale image color 300 dpi.

4. Experimentations and Results

The evaluation of the proposed method has been achieved in two steps:

4.1. Evaluation of the forms Identification

We presented to our system all the 50 fulfilled forms from both groups (form_U or form_B) in a random order and we counted the number of times when the system proposed the form name correctly. All the forms used have been detected correctly in 100% of the cases.

4.2. Evaluation of the Handwriting Fields Extraction

In order to evaluate this method and establish a significant performance rate that can be analyzed and discussed; we decided to proceeded by counting first the number of handwriting words, digits or numbers present in each group of forms than we counted their number in the output of the system if they have been correctly and completely extracted “Table 2”.

The performance of our method reaches 84% with the form_U when this rate increases to 92% with the form_B. some results is presented in “Figure 4”.

The results obtained by our whole system (detection and extraction) are very promising. Regarding the forms detection theses results were expected since the parameters used to distinguish the two forms don’t contain any intersection or confusion range between them. However, it seems hard to ensure that this rate will stay so high if we used more than 2 forms particularly if these forms present a high degree of similarity.

Figure 3. Samples of the fulfilled forms.

Table 2. Our system performance based on the 2 forms.

Regarding the extraction method, the results were very promising and positive since the global performance of the method for the 2 forms is around 90%. These results don’t take into account the quality of the text extracted. Indeed, some additional post-processing methods can be applied such as dilatation or noise removal in order to make these fields more exploitable by the word recognition module.

5. Conclusions

In this paper we present a handwriting extraction fields system, which can be seen as a preprocessing module

Figure 4. Some samples of the extracted handwriting fields.

in a complete handwriting recognition system. This module extracts the zone of interests (handwriting fields) in any entered form and makes these zones available to the recognition module.

We propose a free language and a free form method. Indeed, this method remove the original form structure, in other terms “subtract” from the fulfilled form the empty one.

To make this operation possible a preprocessing step was applied, we binarized all the forms using the Otsu algorithm then we removed some noise. The second step was the form identification since we evaluated our system with 2 different forms types and then an orientation step was necessary to make the matching operation possible; to this purpose we used a Fourier-Mellin transform.

We evaluated our system with 50 forms belonging to two different forms (bank form and a university form). The global performance of our method gives around 90% of good extraction.

We propose in the future, to investigate extraction of the handwriting fields from any form or document with- out any a priori about the structure of this document, which is once again more close to the reality.

References

  1. Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H. and Schmidhuber, J. (2009) A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 855-868.
  2. Lorigo, L.M. and Govindaraju, V. (2006) Offline Arabic Handwriting Recognition: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 712-724. http://dx.doi.org/10.1109/TPAMI.2006.102
  3. Plamondon, R. and Srihari, S.N. (2000) Online and Off-Line Handwriting Recognition: A Comprehensive Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 63-84. http://dx.doi.org/10.1109/34.824821
  4. Senior, A.W. and Robinson, A.J. (1998) An Off-Line Cursive Handwriting Recognition System. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 309-321. http://dx.doi.org/10.1109/34.667887
  5. Koch, G., Heutte, L. and Paquet, T. (2003) Numerical Sequence Extraction in Handwritten Incoming Mail Documents. Seventh International Conference on Document Analysis and Recognition, 1, 369-373.
  6. Chatelain, C., Heutte, L. and Paquet, T. (2004) A Syntax-Directed Method for Numerical Field Extraction Using Classifier Combination. 9th International Workshop on Frontiers in Handwriting Recognition IWFHR-9, 26-29 October 2004, 93-98.
  7. Clawson, R. and Barrett, W. (2012) Extraction of Handwriting in Tabular Document Images. Family History Technology Workshop at Rootstech.
  8. Samoud, F.B., Maddouri, S.S., Abed, H.E. and Ellouze, N. (2008) Comparison of Two Handwritten Arabic Zones Extraction Methods of Complex Documents. Proceedings of International Arab Conference on Information Technology, Hammamet, 1-7.
  9. Otsu. N. (1979) A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man and Cybernetics, 9, 62-66. http://dx.doi.org/10.1109/TSMC.1979.4310076
  10. Adam, S., Rousseau, F., Ogier, J.M., Cariou, C., Mullot, R., Labiche, J. and Gardes, J. (2001) A Multi-Scale and Multi- Orientation Recognition Technique Applied to Document Interpretation Application to French Telephone Network Maps. IEEE International Conference on Acoustics, Speech, and Signal Processing, 3, 1509-1512.
  11. Liu, Q., Zhu, H.Q. and Li, Q. (2011) Object Recognition by Combined Invariants of Orthogonal Fourier-Mellin moments. 8th International Conference on Information, Communications and Signal Processing (ICICS), Singapore, 13- 16 December 2011, 1-5.
  12. Sharma, V.D. (2010) Generalized Two-Dimensional Fourier-Mellin Transform and Pattern Recognition. 3rd International Conference on Emerging Trends in Engineering and Technology (ICETET), Goa, 19-21 November 2010, 476- 481.

Journal Menu >>