Paper Menu >>
Journal Menu >>
![]() J. Biomedical Science and Engineering, 2010, 3, 85-94 doi:10.4236/jbise.2010.31013 Published Online January 2010 (http://www.SciRP.org/journal/jbise/ JBiSE ). Published Online January 2010 in SciRes. http://www.scirp.org/journal/jbise Cepstral and linear prediction techniques for improving intelligibility and audibility of impaired speech G. Ravindran1, S. Shenbagadevi2, V. Salai Selvam3 1Faculty of Information and Communication Engineering, College of Engineering, Anna University, Chennai, India; 2Faculty of Information and Communication Engineering, College of Engineering, Anna University, Chennai, India; 3Department of Electronics & Communication Engineering, Sriram Engineering College, Perumalpattu, India. Email: vsalaiselvam@yahoo.com Received 30 October 2009; revised 20 November 2009; accepted 25 November 2009. ABSTRACT Human speech becomes impaired i.e., unintelligible due to a variety of reasons that can be either neuro- logical or anatomical. The objective of the research was to improve the intelligibility and audibility of the impaired speech that resulted from a disabled human speech mechanism with impairment in the acoustic system-the supra-laryngeal vocal tract. For this pur- pose three methods are presented in this paper. Method 1 was to develop an inverse model of the speech degradation using the Cepstral technique. Method 2 was to replace the degraded vocal tract response by a normal vocal tract response using the Cepstral technique. Method 3 was to replace the de- graded vocal tract response by a normal vocal tract response using the Linear Prediction technique. Keywords: Impaired Speech; Speech Disability; Cepstrum; LPC; Vocal Tract 1. INTRODUCTION Speech impairments or disorders refer to difficulties in producing speech sounds with voice quality [1]. Thus impaired speech is the speech sound that lacks in voice quality. Speech becomes impaired due to a variety of reasons that can be either neurological e.g., aphasia & dysarthria or anatomical e.g., cleft lip & cleft palate [1]. The speech impairment is generally categorized into Articulation impairment e.g., omissions, substitutions or distortions of sounds, Voice impairment e.g., inappropri- ate pitch, loudness or voice quality Fluency impairment e.g., abnormal rate of speaking, speech interruptions or repetition of sounds, words, phrases or sentences inter- fering effective communication, Language impairment e.g., phonological, morphological, syntactic, semantic or pragmatic use of oral language [2]. The most commonly used techniques to help people with speech impairments are training programmes by speech therapists at home or at hospitals or at a combi- nation of these, sign language like Makaton and elec- tronic aids like text-to-speech conversion unit. 1.1. General Properties of Speech Though non-stationary the speech signal can be consid- ered as stationary over short periods typically 10-50 msec [3,4,5]. Effective bandwidth of speech is 4-7 kHz [4,5]. The elementary linguistic unit of speech is called a phoneme and its acoustic realization is called a phone [7]. A phoneme is classified as either a vowel or a con- sonant [3,4,5]. The duration of a vowel does not change much and is 70 ms on an average while that of a conso- nant varies from 5 to 130 ms [3]. 1.2. Speech Production The diaphragm forces air through the system and the voluntary movements of anatomical structures of this system shape a wide variety of waveforms broadly clas- sified into voiced & unvoiced speech [5]. This is de- picted in Figure 1. With voiced speech, the air from the lungs is forced through the glottis (opening between the vocal cords) and the tension of the vocal cords is adjusted so that they vibrate at a frequency, known as pitch frequency, which depends on the shape and size of the vocal cords, result- ing in a quasi-periodic train of air pulses that excites the resonances of the rest of the vocal tract. The voluntary movements of the muscles of this vocal tract change its shape and hence resonant frequencies, known as for- mants, producing different quasi-periodic sounds [7]. Figure 2 shows a sample of voiced speech segment and its spectrum with formant peaks. With unvoiced speech, the air from the lungs is forced through the glottis and the tension of the vocal cords is adjusted so that they do not vibrate, resulting in a noise- like turbulence that excites normally a constriction ![]() 86 G. Ravindran et al. / J. Biomedical Science and Engineering 3 (2010) 85-94 SciRes Copyright © 2010 JBiSE Figure 1. Block diagram of human speech production. Figure 2. Voiced speech segment and its spectrum exhibiting four resonan frequencies called ‘formants’. Figure 3. Unvoiced speech segment and its spectrum. ![]() G. Ravindran et al. / J. Biomedical Science and Engineering 3 (2010) 85-94 SciRes Copyright © 2010 87 speech voiced unvoiced Random noise with zero mean (white noise) generator Linear Filter H(z) representing vocal tract Pitch Impulse train generator Source Figure 4. Source-filter model of a human speech mechanism. (b) S p/ (k) s p (n) S p (k) s p/ (n) DFT Complex exp IDFT S / (k) s p/ (n) S(k) (a) Normal speech segment s(n)=e(n)*h(n) DFT Complex log IDFT Figure 5. (a) Complex cepstrum and (b) its inverse (after Oppenheim &Schafer). in the rest of the vocal tract. Depending on the shape and size of the constriction different noise-like sounds are produced. Figure 3 shows a sample of unvoiced speech segment and its spectrum with no dominant peaks. JBiSE Thus a speech signal can be supposed to be a convo- lution of two signals: 1) a quasi-periodic pulse-like (for voiced speech) or a noise-like (for unvoiced speech) glottal excitation signal generated by a combination of lungs and vocal cords and 2) a system response repre- sented by the shape of the rest of the vocal tract [4]. The excitation signal generally exhibits the speaker characteristics such as pitch and loudness while the vo- cal tract response determines the sound produced [5]. 1.3. Source-Filter Model of Human Speech Mechanism A speech signal, s(n) is convolution of a fast varying glottal excitation signal, e(n) and a slowly varying vocal tract response, v(n) i.e., s(n)=e(n)*v(n). For a voiced speech, e(n) is a quasi-periodic waveform and v(n) is a combined effect of the glottal wave shape, the vocal tract impulse response and the lip radiation impulse response while for an unvoiced speech, e(n) is a random noise and v(n) is a combined effect of the vocal tract impulse re- sponse and the lip radiation impulse response [4]. A human speech mechanism thus can be viewed as a source capable of generating a periodic impulse train at pitch frequency for voiced speech or a white noise for unvoiced speech followed by a linear filter having an impulse response that represents the shape of the vocal tract [4]. This is depicted in Figure 4. 1.4. Speech Processing Techniques 1.4.1. Cepstral Technique Complex cepstrum: The complex cepstrum s(n) of a signal is defined as the inverse Fourier transform of the logarithm of the signal spectrum [8]. s(n) jω S(e ) π jωjω π s(n) (12π)logS(e|)e where is the Fourier transform of . Com- putation of complex cepstrum requires phase unwrap- ping, which is difficult due to some theoretical and prac- tical reasons [8]. This is depicted in Figure 5 (a) & (b). jω S(e )s(n) ![]() 88 G. Ravindran et al. / J. Biomedical Science and Engineering 3 (2010) 85-94 SciRes Copyright © 2010 JBiSE S / (k) s(n) DFT log|.| IDFT S(k) s r/ (n) (a) (b) s p (n) S r/ (k) S r (k) s r/ (n) DFT log|.| IDFT Figure 6. (a) Real cepstrum and (b) its inverse (after Oppenheim & Schafer). Cepstral domain is known as quefrency (coined from ‘frequency’) domain [8]. Real cepstrum: The real cepstrum sr /(n) of a signal s(n) is defined as the inverse Fourier transform of the loga- rithm of the signal magnitude spectrum |S(ej)| [8]. sr /(n)=(1/2)log(|S(ej)|)ej where S(ej) is the Fourier transform of s(n). Real cepstrum is not invertible but provides a mini- mum phase reconstruction of the signal [8]. This is de- picted in Figure 6 (a) and (b). Since speech is convolution of a fast varying glottal excitation signal, e(n) and a slowly varying vocal tract response, h(n), the cepstrum of a speech segment con- sists of the glottal excitation signal that occupies the low quefrency region and the vocal tract response that occu- pies the high quefrency region [8]. Since the phase information is not as much important as the magnitude information in a speech spectrum, the real cepstrum is used due to its computational easiness [8]. The first M samples where M is the number of chan- nels allotted to specifying spectral envelope information [9], typically first 2.5 ms to 5 ms duration, of the cep- strum of a speech segment represent the vocal tract re- sponse while the remaining samples represent the glottal excitation signal. A simple windowing process using a rectangular window separates the vocal tract response from the glot- tal excitation signal in the quefrency domain. The inverse process of cepstrum involving exponen- tiation obtains these signals in time domain. 1.4.2. Linear Prediction Technique Linear prediction technique is a system modeling tech- nique that models the vocal tract response in a given speech segment as an all-pole linear filter with transfer function of the form G H(z)= -------------------------- P 1+ap(k)zk k=1 where G is dc gain of the filter, p is the order of the filter, ap(k), k=1,2,...p are the filter coefficients, leaving out the glottal excitation as the residual of the process [4]. Thus the LP technique separates the vocal tract response from the glottal excitation. The various formulations of linear prediction tech- nique are 1) the covariance method 2) the autocorrela- tion method 3) the lattice method 4) the inverse filter formulation 5) the spectral estimation formulation 6) the maximum likelihood formulation and 7) the inner prod- uct formulation [4,14]. In this paper the filter coefficients and the dc gain were estimated from speech samples via the autocorrela- tion method by solving the so-called Yule-Walker equa- tions with the help of Levinson-Durbin recursive algo- rithm [11,14]. 2. DATA ACQUISITION & SIGNAL PRE-PROCESSING Adult subjects, 52 subjects (11 females and 41 males) with distorted sound, 41 subjects (3 females and 38 males) with prolonged sound, 12 subjects (all males) with stam- mering, 9 subjects (1 female and 8 males) with omissions and 5 (all males) with substitutions were selected. They were asked to spell the phonemes “a” as in “male”, “ee” as in “speech”, “p” as “pet”, “aa” as in “Bob” and “o” as in “boat”. These speech signals were recorded using a Pentium-IV computer with 2 GB RAM, 160 GB HDD, PC-based microphone, 16-bit sound card, and free audio recording and editing software at a sampling rate of 8 KHz. These signals hereafter will be referred to as ![]() G. Ravindran et al. / J. Biomedical Science and Engineering 3 (2010) 85-94 SciRes Copyright © 2010 JBiSE 89 Source e(n) (lungs+vocal cards) Filter h(n) (normal vocal tract response ) Normal speech segment s(n)=e(n)*h(n) Figure 7. Source-filter model of a normal human speech mechanism. Source e(n) (lungs+vocal cards) Filter h d (n) (degraded vocal tract response) Degraded speech segment s d (n)=e(n)*h d (n) Figure 8. Source-filter model of a degraded human speech mechanism. Degraded speech segment s d (n)=e(n)*h(n)*g(n) =e(n)*h d (n) Normal speech segment s(n)=e(n)*h(n) Filter h(n) (normal vocal tract response) Degrading system g(n) Source e(n) (lungs+vocal card) Figure 9. Source-filter model of a degraded human speech mechanism with degradation as separate system. Cepstrum of normal speech segment s/(n)=e/(n)+ h /(n) (/ means cepstrum) Normal speech segment s(n)=e(n)*h(n) Real Cepstrum Figure 10. Cepstrum of normal speech segment. Cepstrum of degraded speech segment s d/ (n)= e / (n ) +h d/ (n) =e / (n)+h / (n)+g / (n) ( / means cepstrum ) Degraded speech segment s d (n)=e(n)*h d (n) =e(n)*h(n)*g(n) Real Cepstrum Figure 11. Cepstrum of degraded speech segment. Degraded speech segment s d (n) Inverse model of g(n) Restored speech segment s r (n) Figure 12. Restoring normal speech segment from degraded speech segment via inverse model of degradation. ![]() 90 G. Ravindran et al. / J. Biomedical Science and Engineering 3 (2010) 85-94 SciRes Copyright © 2010 JBiSE impaired speech signals or utterances in the text. Same number (119) of normal subjects of similar age and sex were selected and asked to spell the same set of phonemes. These signals were recorded under similar conditions and will hereafter be referred to as normal speech signals or utterances in the text. These signals were then lowpass-filtered to 4 KHz to avoid any spectral leakage. The arithmetic mean of each filtered signal was sub- tracted from it in order to remove the DC offset, an arti- fact of the recording process [12]. The speech portion of each speech signal was ex- tracted from its background using an endpoint detection algorithm explained in [13]. 3. METHODS Three methods were developed, all being based on the source-filter model of the human speech mechanism. The first two methods were based on the cepstral tech- nique and the third method was based on the Linear Pre- diction Coding (LPC) technique. In all these three method, the speech was assumed to be the linear convo- lution of the slowly varying vocal tract response, and the fast varying glottal excitation [4,5,8,9]. 3.1. Method 1 This method was based on the following facts: 1) Though non-stationary, speech signal can be considered as sta- tionary for a short period of 20-40 ms [3,4], 2) Speech is a convolution of two signals: the glottal excitation signal and the vocal tract response [4], 3) The excitation signal generally exhibits the speaker characteristics such as pitch and loudness while the vocal tract response deter- mines the sound produced [5] and 4) Cepstrum trans- forms a convolution process into an addition process [8]. 1) and 2) make the short-term analysis of speech sig- nal possible and model a normal human speech mecha- nism as a linear filter excited by a source as shown in Figure 7. Similarly a disabled human speech mechanism with an impaired vocal tract is modeled as shown in Figure 8. If the degraded vocal tract is modeled as the normal vocal tract followed by a degrading system, then the above source-filter model can be equivalently repre- sented as shown in Figure 9. As suggested by 4), the cepstrum of normal speech segment would be the addition of the cepstrum of nor- mal vocal tract response and the cepstrum of excitation as shown in Figure 10. Similarly for an impaired speech segment the cepstral deconvolution of degraded speech segment is shown in Figure 11. As suggested by 3), if the speech segment, in both cases, represents a similar sound unit (e.g., a similar phoneme), then the hd /(n) can be represented in term of h/(n) from Figure 9 to Figure 11 as follows hd(n)=h(n)*g(n) hd /(n)=h/(n) +g/(n) Subtraction of the cepstrum of normal vocal tract from the cepstrum of degraded vocal tract for a similar sound unit yields the cepstrum of degradation as follows g/(n)=hd /(n)h/(n) The inverse cepstrum of g/(n) yields the degradation in time domain, g(n). The inverse model of g(n) is ob- tained as the reciprocal of autoregressive or all-pole model of g(n) obtained via the Levinson-Durbin algo- rithm. The speech segment is restored by passing the de- graded speech segment through the inverse model of the degradation as shown in Figure 12. Figure 13 shows a complete block diagram represen- tation of the entire method of restoring the speech via the inverse model of the degradation. 3.2. Method 2 Method 2 was based on the same set of facts as Method 1. In this method, the degraded vocal tract response, hd(n) for a particular phoneme from a disabled speech mecha- nism is replaced by the normal vocal tract response, h(n) for the same phoneme from a normal speech mechanism. The extraction of the vocal tract responses and recon- structing the speech of improved intelligibility and audi- bility is done using cepstral technique. The cepstrum of a speech segment consists of the glottal excitation signal that occupies the low quefrency region and the vocal tract response that occupies the high quefrency region [8]. The first M samples where M is the number of channels allotted to specifying spectral envelope infor- mation [9], typically first 2.5 ms to 5 ms duration, of the cepstrum of a speech segment represent the vocal tract response while the remaining samples represent the glottal excitation signal. A simple windowing process using a rectangular window separates the vocal tract response from the glottal excitation signal in the que- frency domain [8]. The block diagram representation of the second method has been shown in Figure 14. 3.3. Method 3 Method 3 was based on the following facts: 1), 2) and 3) were the same and 4) A speech segment of short duration, e.g., 10-40 ms can be effectively represented by an all-pole filter of order p [14], which is often chosen to be at least 2*f*l/c where f is the sampling frequency, l is the vocal tract length and c is the speed of sound [7,14]. For a typical male utterance with l=17 cm and c=340 m/s, p=f/1000. These filter coefficients were estimated through linear predictive analysis for a speech segment of short duration, e.g., 10-40 ms. The excitation signal is ![]() G. Ravindran et al. / J. Biomedical Science and Engineering 3 (2010) 85-94 SciRes Copyright © 2010 JBiSE 91 Restored speech segment s r ( n ) Cepstrum of normal vocal tract response h d/ (n)=h / (n)+g / (n) Cepstrum of normal vocal tract response h / (n) + Cepstrum of normal speech segment s / (n)=e / (n )+h / (n) ( / means cepstrum) Normal speech segment s(n)=e(n)*h(n) Real Cepstrum Short pass liftering Cepstrum of degraded speech segment s d/ (n)=e / (n)+h d/ (n) =e / (n)+h / (n)+g / (n) Degraded speech segment s d (n)=e(n)*h d (n) =e(n)*h(n)*g(n) Inverse Cepstrum Cepstrum of degradation g / (n) Inverse model of degradation g(n) Real Cepstrum Short pass liftering Figure 13. Block diagram representation of Method 1. obtained either by passing the speech segment through this filter or by synthesizing with estimated pitch period, gain and voicing decision for that segment [14,4]. Here the former is utilized. Thus the linear predictive analysis splits a speech segment into excitation and vocal tract response [14,4]. The first two assumptions make the short-term analy- sis of speech signal possible and model both the normal and disabled human speech mechanisms as described for Method 1 & 2. As suggested by 4), both the normal and impaired speech can be split into excitation and vocal tract re- sponse. As suggested by 3), the LP coefficients of the normal speech segment in the place of those of the impaired speech segment are used while the excitation is obtained either from the LP residual for the impaired speech seg- ment or from synthesis from pitch period, gain and voicing decision estimated from the impaired speech segment. Here the former is utilized Here the degraded vocal tract response, hd(n) for a particular phoneme from a disabled speech mechanism, obtained via the LPC technique, is replaced by the nor- mal vocal tract response, h(n) for the same phoneme from a normal speech mechanism obtained via the LPC technique. The glottal excitation from the degraded (impaired) speech is obtained via the LPC technique as the linear prediction residual [14,4]. The block diagram representation of the method has been depicted in Figure 15. 4. IMPLEMENTATION In all three methods, the speech portions from both the normal and degraded phonemes were extracted using the algorithm in [13] and the normal utterance was time- scaled to match the length of the impaired utterance us- ing the modified phase vocoder [15]. Then each utter- ![]() 92 G. Ravindran et al. / J. Biomedical Science and Engineering 3 (2010) 85-94 SciRes Copyright © 2010 JBiSE Restored speech segment s r (n) + + Cepstrum of degraded speech segment s d / (n)=e / (n)+h d / (n) ( / means cepstrum) Cepstrum of normal speech segment s d / (n)=e / (n)+h / (n) ( / means cepstrum) Short pass lifter Cepstrum of normal vocal tract response h / (n) Long pass lifter Cepstrum of excitation e / (n) Inverse ce p stru m Figure 14. Block diagram representation of Method 2. Restored Speech segment Linear Predictive Analysis Normal speech segment LP coefficients Linear Predictive Analysis Degraded Speech se g ment LP coefficients LP residual Figure 15. Block diagram representation of Method 3. ance was segmented into short frames of 20 msec dura- tion overlapping by 5 msec. In Method 1 & 2, both the frames were preemphasised to cancel the spectral contributions of the larynx and the lips to the speech signal using H(z) = 1z1 with = 0.95 [7,12]. Then the cepstra of both the frames were computed [8,9]. In Method 1, the first 40 samples of the cepstrum of the normal speech frame were subtracted from those of the cepstrum of the degraded speech frame to extract the cepstrum of the degrading function. The inverse cep- strum of the resultant yielded the degrading function which was then modeled as an all-pole filter. The inverse of the model was then used to restore the speech. In Method 2, the first 40 samples of the cepstrum of the degraded speech frame were replaced by those of the cepstrum of the normal speech frame. In Method 3, after segmentation, the frames were preemphasised and their autocorrelations were computed. The resultant autocorrelations were used to compute the LPC coefficients using Levinson-Durbin recursive algo- rithm. Then the LPC coefficients computed from the degraded speech frame and the frame itself were used to compute the LPC residues. These LPC residues and The LPC coefficients computed from the normal speech frame were used to synthesize the restores speech frame. ![]() G. Ravindran et al. / J. Biomedical Science and Engineering 3 (2010) 85-94 SciRes Copyright © 2010 JBiSE 93 All the above steps were repeated for all frames and for all phonemes. MATLAB 7 was used for program- ming purposes. 5. COMPARISON OF METHODS The main advantage of these three methods was that the restored speech had the speaker characteristics since the excitation from the distorted sound was used for restora- tion as the excitation (the glottal impulse) exhibits mainly the speaker characteristics while the vocal tract response (the articulation) gives rise to various phonetic realization. All the three methods worked acceptably well with certain articulation impairment such as distorted sound & prolonged sound. All the three methods suffered from a basic problem, the phonetic mismatching. That is, the process of matching the respective phonemes in the normal and impaired utterances lacks the accuracy due to the fact that the duration of a phoneme in a syllable or a word from two different speakers may not be equal and also its articulation and temporal-spectral shape varies with respect to the preceding and succeeding phones. More- over, the dynamic time warping techniques used to match two similar time-series may not be used to match the normal and the distorted strings, though they are the same utterances, since they are not ‘similar’, one being normal and the other, distorted. All the three methods did not suit for all the speech impairments. For example they did not help solving cer- tain common impairments such as stammering, omis- sions, substitutions. Method 1 & 2 suffered from the fact that the real cep- strum is not invertible; only a minimum phase recon- struction is possible. The phase information was lost. Method 1 suffered from the problem of extracting the degradation exactly since the vocal tract response for a phoneme independently obtained from two speakers do not match sample vice. Hence subtracting the first 25 samples of the cepstrum (representing the normal vocal tract response) of normal speech segment from those (representing the degraded vocal tract response) of im- paired speech segment may not exactly give the degra- dation in the vocal tract response of the impaired subject. The LP coefficients do not represent the vocal tract response independently of speakers. Hence the restored sound possessed the quality of both of the speakers, the normal and the problem but more towards the problem speaker and less towards the normal speaker. 6. RESULT In order to assess the result of the above experiments, one thousand observers (500 females and 500 males) of different age group varying from 20 to 40 were selected and requested to listen to the degraded, normal and re- Table 1. Votes of favour for subjects with dis- torted sound: Total No. of Votes=521000. Bad Good Excellent “a” in “male” 2080 46904 3016 “ee” in “speech” 1976 46956 3068 “p” in “pet” 5304 46644 156 “aa” in “Bob” 1144 49452 1404 “o” in “boat” 520 50076 1404 Table 2. Votes of favour for subjects with pro- longed sound: Total No. of Votes=411000. Bad Good Excellent “a” in “male” 820 39360 820 “ee” in “speech” 2747 37761 492 “p” in “pet” 6150 34850 0 “aa” in “Bob” 2132 37392 1517 “o” in “boat” 369 39360 1271 Table 3. Votes of favour for subjects with stam- mering: Total No. of Votes=121000. Bad Good Excellent “a” in “male” 9612 1800 588 “ee” in “speech” 9612 1800 588 “p” in “pet” 11856 144 0 “aa” in “Bob” 10680 1200 120 “o” in “boat” 1200 10800 - Table 4. Votes of favour for subjects with omis- sions: Total No. of Votes=91000. Bad Good Excellent “a” in “male” 8982 18 - “ee” in “speech” 8955 45 - “p” in “pet” 9000 - - “aa” in “Bob” 8982 18 - “o” in “boat” - - - ![]() 94 G. Ravindran et al. / J. Biomedical Science and Engineering 3 (2010) 85-94 SciRes Copyright © 2010 Table 5. Votes of favour for subjects with substitu- tions: Total No. of Votes=51000. [2] (2002) Department of Education, Special education pro- grams and services guide, State of Michigan State. [3] Shuzo, S. and Kazuo, N. (1985) Fundamental of Speech Signal Processing. Academic Press, London. BadGood Excellent “a” in “male” 5000- - “ee” in “speech” 5000- - “p” in “pet” 5000- - [4] Rabiner, L.R. and Schafer, R.W. (1978) Digital process- ing of speech signal, Prentice-Hall, Engliwood Cliffs, NJ. [5] Rabiner, L.R. and Juang, B.H. (1993) Fundamentals of speech recognition, Prentice-Hall, Engliwood Cliffs, NJ. [6] Rabiner, L.R. and Bernard, G. (1992) Theory and appli- cation of digital signal processing, Prentice-Hall of India, New Delhi, Chapter 12. [7] Thomas, F.Q. (2004) Discrete-time speech signal proc- essing. Pearson Education, Singapore. stored phonemes and to rate them as bad, good or excel- lent in terms of their intelligibility and audibility. The votes obtained in favour was tabulated as shown in Ta- bles 1, 2, 3, 4, & 5. [8] Oppenheim, A.V. and Schafer, R.W. (1992) dis- crete-time signal processing, Prentice-Hall of India, New Delhi. [9] Oppenheim, A.V. (1969) Speech analysis-synthesis based on homomorphic filtering, Journal of Acoustic So- ciety of America, 45, 458-465. 7. CONCLUSIONS [10] Oppenheim, A.V. (1976) Signal analysis by homomor- phic prediction. Proc. IEEE, ASSP, 24, 327. The future development of this research work, thus, will be focused on developing 1) a formant-based technique and 2) a homomorphic prediction-based technique with complex cepstrum since real cepstrum lacks in phase information [8,9] and on developing a system for con- tinuous speech i.e., words and sentences. For a real-time continuous speech processing, use of dedicated digital signal processor could be an opt suggestion. [11] Proakis, J. G. and Manolakis, D. G. (2000) Digital Signal Processing, Prentice-Hall of India, New Delhi. [12] Tony, R. (1998) Speech Analysis Lent Term. [13] Nipul, B, Sara, M., Slavinskym J.P. and Aamirm V. (2000) A project on speaker recognition’ rice university. [14] Makhoul, J. (1975) Linear prediction: a tutorial review, Proc. IEEE, 63, 561-580. [15] Jean, L. and Mark, D. (1999) New phase-vocoder tech- niques for pitch-shifting, harmonizing and other exotic effects, Proc. IEEE WASPAA. REFERENCES [1] (2004) NICHCY disability fact sheet., Speech & Lan- guage Impairments. NICHCY. 11. JBiSE |