Paper Menu >>
Journal Menu >>
Open Journal of Acoust i c s , 2011, 1, 34-40 doi:10.4236/oja.2011.12005 Published Online September 2011 (http://www.SciRP.org/journal/oja) Copyright © 2011 SciRes. OJA In-Solid Acoustic Source Localization Using Likelihood Mapping Algorithm Ming Yang, Mostafa Al-Kutubi, Duc Truong Pham School of Engineering, Cardiff University, Cardiff, UK E-mail: yangm@hotmail.co.uk Received July 19, 201 1; revised August 8, 2011; accepted August 12, 2011 Abstract The significant challenge in human computer interaction is to create tangible interfaces that will make digital world accessible through augmented physical surfaces like walls and windows. In this paper, various acous- tic source localization methods are proposed which have the potential to covert a physical object into a tracking sensitive interface. The Spatial Likelihood method has been used to locate acoustic source in real time by summing the spatial likelihood from all sensors. The source location is obtained from searching the maximum in the likelihood map. The data collected from the sensors is pre-processed and filtered for im- provement of the accuracy of source localization. Finally a sensor fusion algorithm based on least squared error is presented to minimize the error while positioning the source. Promising results have been achieved experimentally for the application of acoustic tangible interfaces. Keywords: Acoustic Tangible Interfaces, Acoustic Source Localization, Digital Signal Processing 1. Introduction With ordinary interface devices (e.g. keyboard, mouse and touch screen), the interaction of humans with com- puters is restricted to a particular device at a certain loca- tion within a small movement area. A challenge in hu- man computer interaction research is to create tangible interfaces that will make the interaction possible via augmented physical surfaces, graspable objects and am- bient media. In this paper acoustics-based remote sensing technology is presented since vibrations are the natural outcome of an interaction and propagate well in most solid materials. This means that the information pertain- ing to an interaction can be conveyed to a remote loca- tion [1-3] using the structure of the object itself as a transmission channel and therefore suppressing the need for an overlay or any other intrusive device over the area one wishes to make sensitive. Time Difference of Arrival (TDOA) is commonly used for source localization without using a universal timing mechanism as shown in Figure 1. The principle of the TDOA localisation is to measure the time delays between the arrivals of the signals to the various sensors. These delays result from the distance differences from the acoustic impact sources to the sensors at known loca- tions. To determine time delay of real signals, cross cor- relation is commonly used. But in this application its result usually contains multiple p eaks and therefore there is no guarantee that the peak will occur at the correct time difference. This problem of noisy cross correlation can be handled by accumulating the entire cross correla- tion vector of each sensor pair rather than selecting the single peak of each one. The summation of vectors from multiple sensor pairs yields a likelihood map. The high- est likelihood in the map is the estimate of the source location. The likelihood mapping method has been developed for room acoustics based on two different mathematical formulations. In [3] the processing is performed in the time domain and known by accumulated correlation, and in [4] the processing is performed in the frequency do- main and known by sp atial likelihood function. In this paper the likelihood mapping method is en- hanced for Tangible Acoustic Interface and referred to as the Enhanced Likelihood Mapping (ELM). The ELM is more robust in searching for the most likely source posi- tion and less prone to be affected by data fluctuation comparing to the TDOA method. Finally a sensor fusion algorithm based on least squared error is used to mini- mize the error of the estimated time differences while positioning the source. M. YANG ET AL. Copyright © 2011 SciRes. OJA 35 4 sensor 1 3 2 x y Passive human inte raction Signal ConditioningD ata Acqu isitionSignal Proc es s ing Display Monit or Projector Int era ctive So lid O bject Vibrati on Figure 1. Tangible Acoustic Interface (TAI) model diagram. 2. ELM Theory If the source signal is s(t), then the received signal can be defined as ii ii g thtst nt , (1) where ni is independent zero mean white Gaussian noise with variance i , theoretically gi(t) can be regarded as an estimator for i where / ii qu v given ui the ith sensor location. Based on Bayes’ Rule, the posterior probability that the sou rce is located at q is 1 11 ,, |,, ,|, ,,, N NN Pgg qsPqs pPqsg gPg g (2) Since the denominator is a normalization constant, if assuming the prior ,Pqs is uniform, then maximizing (2) becomes maximizing the likelihood 1,, |, N Pgg qs. With gi is considered as independ ent random variable, it can be shown that 2 2 [( )()] d 2 ' 11 |, e Tii Ti gtst t NN i ii pPgqs (3) Substituting s with its maximum likelihood estimate given by 1 1 ˆN ii i sgt N in (3), and assuming equal i for all sensors, then tak- ing the logarithm yields, 2 1 21 ˆ log'()( )d T N iiC E iT N pgtsttVV NN (4) where 11 d T NN Ciijj iji T Vgtgtt (5) is the accumulated correlation and 2 1d T N Eii iT Vgtt (6) is a constant representing the combined energy of the signals. Accordingly the estimated lo cation can be found from the maximum of (5). The theory of the algorithm is based on treating the cross correlation as an observational estimate of 1,, |, N Pggqs which is related to the posterior es- timate as given in (1) under the same assumption for the prior probability),( sqP [4]. Thus by sub stituting the time difference () ij q between sensors i and j given as a function of source location q in General Cross Correla- tion (GCC) formula [5], the spatial likelihood function in the frequency domain for a pair of sensors is obtained as 2() * ()( )ed ij jfq iji j SLFqfG Gf (7) The advantage of (7) is that it allows for the filtering processes () f to be performed inclusively in the fre- quency d omain. 3. ELM Algorithm Given N sensors, the usable number of time differences is given by M none-repeated combinations of sensor pairs give n by M. YANG ET AL. Copyright © 2011 SciRes. OJA 36 2! 2!2 ! NN MC n (8) With three sensors, three time differences are available. By adding a fourth sensor, the time delay difference is doubled to six. The ELM algorithm allows for sensors fusion by utilizing M time difference information to im- prove accuracy and robustness of the estimated location. From the above theory and by introducing Hilbert en- velop detection operator, the proposed ELM algorithm for TAI can be formulated in compact form for both of time domain and frequency domain as follows and the algorithm architectures are shown in Figure 2. 1, 1, 2 ,, (, )()(, )d NN BPF BPF tijij ij ij ij ji ELMx ygtgtxyt (9) 1, 2(,) * 1, 2 ,, (, )( )ed ij NN jf xy fij ij ij ij ji ELMxyfG Gf (10) given, 22 22 (,) / ij iij j xy x xyy xxyyv (11) where () () BPF BPF g htgt is the band pass filtered signal resulting from the convolution of the signal g and the impulse response of the band pass filter () BBF ht. () f is the weighting filter. The estimated locations ˆˆ , x y can then be found by locating the maximum of (9) or (10). Considering the example of having four sensors lo- cated on the surface of a tangible object as in Figure 1, the theoretical time difference for all sensor pairs can be computed numerically from the hyperbola defined by (11). However, by summing the spatial likelihood from all pairs, the source location can be more reliably ob- tained from the maximum in the likelihood map as shown in Figure 3. 4. Filtering P r o c e s s i n E L M In the previous section the ELM algorith m is verified for Signal Conditioning Cross-correlation Signal Conditioning Signal Conditioning Signal Conditioning Cross-correlation Cross-correlation Cross-correlation Spatia l mapp ing t o x-y c oor dinates Spatia l mapp ing t o x-y c oor dinates Spatial mapping to x- y c o or dinates Spatial mapping to x- y c o or dinates Peak detection Hilbert envelop Hilbert envelop Hilbert envelop Hilbert envelop g 1 g 2 g 3 g i x,y (a) X Spatial mapping to x-y c oor dinates Spatial mapping to x-y coor dinates Spatial mapping to x-y coor dinates Spatial mapping to x-y c oor dinates Peak detect ion g 1 FFT -1 FFT X g 2 FFT -1 Conjugate FFT X g 3 FFT -1 Conjugate FFT X g N FFT -1 Conjugate FFT )(f )(f )(f )(f x,y (b) Figure 2. Algorithm diagram for (a) t E LM and (b) f E LM . M. YANG ET AL. Copyright © 2011 SciRes. OJA 37 Figure 3. Spatial Likelihood of the source at (0.2 m, 0.3 m). TAI application using raw signals. However, the local- ization accuracy and robustness can be further improved by use of various filters. The purpose of the pre-filtering is to remove the noise from the signal as a result of low frequency components and the high frequency compo- nents from the non linear response of the sensors. The use of the popular IIR filter found to be adequately suc- cessful. The designed digital filter is a 10th order band- pass Elliptic filter with lower cut off frequency of 500 HZ and upper cut o ff frequency of 8 KHz. It is clear that pre-filtering has significantly improved the reliability of the estimation as can be seen from the smoothness achieved in the likelihood map in Figure 4 where the local maximum becomes more distinctive compared to the multiple peaks in Figure 3 using just raw signals. The second filtering type employed here is the Phase Transform (PHAT) given by 12 1 () () PHAT X fXf (12) This PHAT processor performs well in a moderately reverberant room. It has been used exten sively to localize acoustic source in a room [6] and in robotics app lications [7]. This is achieved by substituting the filtering process () PHAT f given in (12) into (10). The resulting map of f ELM produced for the same signals used for generat- ing Figure 3 is shown in Figure 5. It is apparent that sharper peak is obtained compared to the pre-filtering method in the time domain. A significant advantage of using f ELM over t ELM is that PHAT process doesn’t require any design parameters, while the pre-filtering int ELM requires knowledge of th e dominan t signal components and noise which is normally obtained by analyzing the signals. That means if these parameters have been considerably changed as a result in changing the object material for example, the filter of t ELM (IIR, FIR or wavelet) has to be redesigned but it doesn’t for Figure 4. Spatial Likelihood of the source at (0.2 m, 0.3 m) using filtered signals. Figure 5. Spatial likelihood map using PHAT process. PHAT. 5. Temporal Smoothing Further enhancement in the ELM algorith m is achievable by treating the dispersion effect in solids. Theoretically, in non dispersive multiple in put system the output of the cross correlation reaches the maximum at time lag equal to the time difference between the arrival of the input signals. On the other hand, in dispersive system, where the wave propagation velocity is a function of frequency, the output peak of the cross correlation envelop occurs at the time lag equals to the group delay of the wave [8]. This fact can be interpreted in practice using Hilbert transform. The analytical signal of a given function z(t) is defined by () ()() Z tztjzt (13) where the imaginary part in (13) is the Hilbert transform of ()zt given by 1() ˆ() d z zt t (14) M. YANG ET AL. Copyright © 2011 SciRes. OJA 38 the envelope function of z(t) is defined as 1/2 22 ˆ ()() ()tztzt (15) The Equation (15) can be used in (9) or (10) to reduce the error in cross correlation caused by dispersion. To visualize the difference between algorithms and the effect of dispersion treatment, the ELM algorithm is ap- plied to impact signals shown in Figure 6 and scratch signals shown in Figure 7. With raw signals the algo- rithm produces multiple p eaks in the lik elihood map with several sharp local maxima comparable to the global maximum as shown in Figures 6(a)-7(a). By condition- ing the input signals, the lo cal maxima are compressed as shown in Figures 6(b)-7(b). When Hilbert envelop is applied, it is observable that the peak is enhanced by shifted local maxima towards the global peak and the overall ELM surface is smoothed as shown in Figures 6(c)-7(c). It is clear from Figures 6(d)-7(d) that PHAT process produces sharper peak with lower side lobes. The Temporal Smoothing has significantly improved the ELM results as seen from the enhanced global maximum and reduced local maxima. Although scratch signals produce more local maxima than the impact signals, the propo sed ELM algorithm has significantly improved the results as seen from the en- hanced global maximum and reduced local maxima. The result shows that Hilbert envelope can be regarded as an effective temporal smoothing filter, and has considerably better improvement on revealing the global peak when used with pre-filtering . 6. Time Difference Based Localisation The accuracy of the time difference based localization vastly depends on the level of error in the time difference values. Therefor e it becomes crucial to develop a reliable algorithm to estimate time differences with less error as possible. An efficient algorithm is developed for esti- mating time differences based on spectral estimation. 6.1. Linear Cross Spectral Phase The classical time difference estimation can be improved by pre filtering the signals or applying the most popular (a) (b) (c) (d) Figure 6. ELM of impact signals using (a) raw signals, (b) conditioned signals, (c) as in (b) with Hilbert envelop and (d) PHAT. M. YANG ET AL. Copyright © 2011 SciRes. OJA 39 (a) (b) (c) (d) Figure 7. ELM of scratch signals using (a) raw signals, (b) conditioned signals, (c) as in (b) with Hilbert envelop and (d) PHAT. PHAT process using GCC which involve filtering in the frequency domain then returning to the time domain to extract the time difference (9) and (10). An alternative method for estimating the time difference is the Linear Cross Spectral Phase (LCSP). The LCSP algorithm esti- mates the time difference entirely in the frequency do- main making the estimation process more efficient and robust than the time domain algorithms particularly for tracking a continuous source. Let the received signal g(t) assumed a broad sense sta- tionary process. The cross spectral density of signals gi(t) and gj(t) can be found from * iji j Pf GfGf (16) where G(f) if the Fourier transform of g (t). Since gj(t) is time delayed from gi(t) by , then in terms of the auto spectral density Aii(f) of gj(t), Equation (16) can be ex- pressed by 2 ejf ij iiij P fAfPf f (17) The time difference appears only in the phase an- gle of (17) as linear function of the frequency f. Since the group velocity is used to compute the length differ- ence, the group delay must be extracted from the phase function in (17) as given by [9] d d f f (18) The cross spectrum and auto spectrum functions can be effectively estimated based using Short Time Fourier Transform (STFT). Then Equation (18) can be computed numerically for the quantities given in samples using linear regression of the form / ii ii f ffff [10,11]. 6.2. Maximum Likelihood Positioning Given M pair of sensors, the Maximum Likelihood algo- rithm (ML) proposed here for TAI can handle the error by minimizing the error between the given time differ- ence ˆm of the mth pair and the ideal time difference () mq associated with the searched location q. If the estimated time differences is modeled by the random variable ˆmmm e where m e is zero-mean additive white Gaussian noise with known standard deviation m , then by assuming the time differences from each pair of sensors are statistically independent, the likeli- M. YANG ET AL. Copyright © 2011 SciRes. OJA 40 hood function can be expressed by the conditional prob- ability density function given by [12] 2 2 ˆ [()] 2 12 1 1 ˆˆ ,, |e 2 mm m Mq Mmm pq (19) taking the log of both sides of (19) yield 1 2 2 2 11 ˆˆ ln, ,| ˆ 1ln 2 22 M MM mm m mm m pq q (20) The ML estimation of location q is the position that maximizes the likelihood function (20) or equivalently that minimizes the second term since the first term is not a function of q which results in the following localization criterion 2 2 1 ˆ() argminMmm MLq mm q Jq (21) Here M L J is a weighted least error estimator. If no sta- tistics considered or m is the same for all sensor pairs, then the denominator is constant and (21) is reduced to the following formula 2 1 ˆˆ argmin( ) M MLqm m m Jq q (22) A significant difference between ML and ELM can be observed that ML doesn’t suffer from side lobes but on the cost of sharpness which means that the ML algorithm provides more stability while ELM algorithm provides higher accuracy. The similarity between the ML maps of the impact and scratch signals is due to the dependence of the ML algorithm on the time differences already es- timated not on the signal themselves. 7. Conclusions In this paper, the in-solid acoustic source localization is developed based on measuring the time difference of arrivals between spatially separated sensors. For efficient operation of TAI with this approach, two methods are proposed for the source localization. In the one-step method the ELM performs the localization based on two algorithms, one encounters time domain processing with conventional post filtering and the other employs PHAT filtering in the frequency domain. In the two-steps method, TDOA values are found first using either GCC or LCSP then based on these values the source is local- ized using ML algorithm. The effect of dispersion is treated by introducing Hilbert envelop smoothing. A criterion is proposed to detect outlier estimations that can happen from domestic noise as door shut. 8. Acknowledgements This work was financed by the European FP6 IST Pro- ject “Tangible Acoustic Interfaces for Computer Human Interaction”. The support of the European Commission is gratefully acknowledged. 9. References [1] W. Rolshofen, D. T. Pham, M. Yang, Z. Wang, Z. Ji and M. Al-Kutubi, “New Approaches in Computer-Human Interaction with Tangible Acoustic Interfaces,” IPROMs 2005 Virtual Conference, May 2005. [2] X. Wang, Z. Wang and B. O’Dea, “A TOA-Based Loca- tion Algorithm Due to NLOS Propagation,” IEEE Trans- actions on Vehicular Technology, Vol. 52, No.1, January 2003, pp. 112-116. [3] S. Birchfield and D. Gillmor, “Fast Bayesian Acoustic Localization,” Proceedings of the IEEE International Conference on Speech and Signal Processing, Florida, Vol. 2, May 2002, pp. 1793-1796. [4] P. Aarabi and S. Zaky, “Robust Sound Localization Us- ing Multi-Source Audiovisual Information Fusion,” El- sevier, Information Fusion, Vol. 2, No. 3, 2001, pp. 209- 223. doi:10.1016/S1566-2535(01)00035-5 [5] C. Knapp and G. Carter, “The Generalized Correlation Method for Estimation of Time Delay”, IEEE Transac- tions on Acoustics, Speech, & Signal Processing Vol. 24, No. 4, 1976, pp. 320-327. doi:10.1109/TASSP.1976.1162830 [6] M. Omologo and P. Svaizer, “Acoustic Localization in Noisy and Reverberant Environment Using CSP Analy- sis,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing Conference, Atlanta, Vol. 2, 7-10 May 1996, pp. 921-924. [7] J. Valin, F. Michaud, J. Rouat, D. LCtoumeau, “Robust Sound Source Localization Using a Microphone Array on a Mobile Robot,” Proceedings of the IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems, Las Vegas, 27-31 October 2003, Vol. 2, pp. 1228- 1233. [8] J. Bendat and A. Piersol, “Random Data,” John Wiley & Sons, 1986. [9] A. Mertins, “Signal Analysis,” John Wiley & Sons, 1999. [10] L. Danfeng and S. Levinson, “A Linear Phase Unwrap- ping Method for Binaural Sound Source Localization on a Robot,” Proceedings of the IEEE Conference on Ro- botics and Automation, Washington, May 2002, pp. 19- 23. [11] H. Poor, “An introduction to Signal Detection and Esti- mation,” 2nd Edition, Springer, New York, Berlin, Hei- delberg, Hong Kong, London, Milan, Paris, Tokyo, 1994. [12] N. Gershenfeld, “The Nature of Mathematical Modeling”, Cambridge University Press, Cambridge, 1999. |