1、A Linked-HMM for Robust Voicing and Speech DetectionPresented by:Emiliano Miluzzowhy the mic is important as a sensor for a people-centric sensing approach?In few wordsLinked-HMM for simultaneous and robust voicing and speech detectionIn few wordsLinked-HMM for simultaneous and robust voicing and sp

2、eech detectionTargeting different experimental settings:low-sampling rates,far-field mic,ambient noise.In few wordsLinked-HMM for simultaneous and robust voicing and speech detectionTargeting different experimental settings:low-sampling rates,far-field mic,ambient noise.Features independent of energ

3、y.In few wordsLinked-HMM for simultaneous and robust voicing and speech detectionTargeting different experimental settings:low-sampling rates,far-field mic,ambient noise.Features independent of energy.Exploit speech patterns,usually combinations of talking and silence segments.Whats nice about the p

4、aperThe first paper presenting the application of linked-HMM for speech and voice detection.Whats nice about the paperThe first paper presenting the application of linked-HMM for speech and voice detection.“simple”algorithm:forward-backward algorithm,features extractionWhats nice about the paperThe

5、first paper presenting the application of linked-HMM for speech and voice detection.“simple”algorithm:forward-backward algorithm,features extraction.Experimental evaluation of some of the aspects of the proposed algorithms.Whats nice about the paperThe first paper presenting the application of linke

6、d-HMM for speech and voice detection.“simple”algorithm:forward-backward algorithm,features extraction.Experimental evaluation of some of the aspects of the proposed algorithms.I learned something useful,namely how to get rid of the impact of constant source contribution(fan,wind blowing,etc.).How ab

7、out the cons?Fairly dense of concepts for a short paper.How about the cons?Fairly dense of concepts for a short paper.Consequently,often lack of clear explanations.How about the cons?Fairly dense of concepts for a short paper.Consequently,often lack of clear explanations.Generally applicable,to mobi

8、le devices such as cell phones for example?How about the cons?Fairly dense of concepts for a short paper.Consequently,often lack of clear explanations.Generally applicable,to mobile devices such as cell phones for example?Training with too few different individuals(just 2)this is a supervised ML met

9、hod!How about the cons?Fairly dense of concepts for a short paper.Consequently,often lack of clear explanations.Generally applicable,to mobile devices such as cell phones for example?Training with too few different individuals(just 2)this is a supervised ML method!Not clear experimental protocol wha

10、t does“noisy conditions”mean?Is comparison in Fig.3 enough to show the improvement over HMM?Is the noise autocorrelation always effective?What if the noise is generated by a high energy periodic noisy signal such as a motor?Is the noise autocorrelation always effective?What if the noise is generated

11、 by a high energy periodic noisy signal such as a motor?This suggests that the proposed technique might.Is the noise autocorrelation always effective?What if the noise is generated by a high energy periodic noisy signal such as a motor?This suggests that the proposed technique might work better in i

12、ndoor environment whereas performs more poorly on mobile devices?Is the noise autocorrelation always effective?What if the noise is generated by a high energy periodic noisy signal such as a motor?This suggests that the proposed technique might work better in indoor environment whereas performs more

13、 poorly on mobile devices?Not clear how variations of one of the features(particularly,noisy autocorrelation)would impact the overall classification result.Few questionsHow does the algorithm differentiate a singer singing a song from an actual conversation?Few questionsHow does the algorithm differ

14、entiate a singer singing a song from an actual conversation?Maybe checking if the spectral content of the voicing part changes over time is an indication of multiple people talkingFew questionsHow does the algorithm differentiate a singer singing a song from an actual conversation?Maybe checking if

15、the spectral content of the voicing part changes over time is an indication of multiple people talkingDoes the system distinguish conversations from a pair of speakers A versus the pair of speakers B?Few questionsHow does the algorithm differentiate a singer singing a song from an actual conversatio

16、n?Maybe checking if the spectral content of the voicing part changes over time is an indication of multiple people talkingDoes the system distinguish conversations from a pair of speakers A versus the pair of speakers B?Same as above plus knowledge of the device owner voice spectral pattern would he

17、lp to filter out outliersOverallNice technique that could be applied to a broad set of scenarios,in my opinion mainly where computational resources are available and not many sources of(periodic)noise are present.In these cases the error is small.OverallNice technique that could be applied to a broa

18、d set of scenarios,in my opinion mainly where computational resources are available and not many sources of(periodic)noise are present.In these cases the error is small.Not sure about its applicability to mobile devices for real-time speech detection.Some of the aspects might be re-used though.Overa

19、llNice technique that could be applied to a broad set of scenarios,in my opinion mainly where computational resources are available and not many sources of(periodic)noise are present.In these cases the error is small.Not sure about its applicability to mobile devices for real-time speech detection.Some of the aspects might be re-used though.Can a mobile-devices oriented scheme tradeoff accuracy versus speed?END

