/   Home   /   Newsroom   /   Research News

Localize Target Source with Complex Watson Mixture Model and Time-Frequency Selection

Dec 05, 2018     Email"> PrintText Size

Sound source localization (SSL) plays an important role in many signal processing applications, including robot audition, camera surveillance, and source separation. Conventionally, SSL algorithms focus on localizing all active sources in the environment, yet cannot distinguish a target speaker from competing speakers or directional noises.  

Recently, researchers from the Institute of Acoustics (IOA) of the Chinese Academy of Sciences addressed the problem of target source localization from a novel perspective by first performing target source separation before localization. 

The paper entitled "Target Speaker Localization Based on the Complex Watson Mixture Model and Time-Frequency Selection Neural Network" was published online in Applied Sciences. 

Based on the sparsity assumption of the signal spectra, there exist target-dominant time-frequency regions that are sufficient for localization, even if the signal is severely corrupted by noises or interferences. Hence, the challenge is how to separate the target source from the non-target sources.  

Researchers adopted a speaker-aware deep neural network to estimate the target binary masks. Speaker-awareness was achieved by using a short adaptation utterance containing only the target speaker as input to an auxiliary adaption network.  

Moreover, the microphone observations were modeled applying a complex Watson mixture model, which made full use of both the inter-channel level difference and phase difference to perform localization.  


Figure 1. The flowchart of target source mask estimation neural network. (Image by WANG Ziteng) 

Simulative experiments showed that the proposed method worked well in various noisy conditions and remained robust when the signal-to-noise ratio was low and when a competing speaker coexisted. 

(Editor: LI Yuan)


WANG Rongquan

Institute of Acoustics

E-mail: media@mail.ioa.ac.cn

Related Articles

sound;acoustic scene;scalogram

Recognize Acoustic Scene More Accurately with Scalogram and Deep Convolutional Neural Network

Nov 28, 2018

Researchers from the Institute of Acoustics proposed a novel framework based on the wavelet transform and deep convolutional neural network. The proposed framework mainly includes two modules, a front-end module based on the wavelet transform and a bac...

sound;silencer;sound absorber

Clues to Design Sound Absorbers for Noise Reduction in Intense Sound Environments

Jun 22, 2018

Researcher PENG Feng from the Institute of Acoustics proposed a semi-empirical model to predict the sound absorption of an acoustical absorber consisting of a porous material with a perforated facing at high sound pressure levels. The developed model a...


How Source's Height Affects Air-to-Water Sound Transmission with a Small-Scale Rough Surface

Dec 01, 2014

With the rapid development in the aircraft industry, the underwater sound field stimulated by aircrafts causes extensive concern. After the air-to-water sound transmission has been studied, it shows that, when the underwater waveguide has a smooth sea ...

Contact Us

Copyright © 2002 - Chinese Academy of Sciences