— Data Analysis —
— Audio/Voice Analysis —

Automatic Sound Signals Quality Estimation (1/4)

To Audio Codecs Quality Analyzer page

Automatic Sound Signals Quality Estimation

•   General outline and the basic components of the system – signal, synchronization and analytical components
•   Test signal (including statistic speech model) and sound perception models
•   Adequacy of analytical estimations based on the results of the comparison of received analytical and subjective MOS estimations
•   Acoustic model further developments
•   Available software
•   System applications

Review of existing quality estimation methods

•   AI (Articulation Index). The idea is that the whole frequency range of speech signal is divided into 20 bands and the signal/noise ratio is determined within the band. The band broad is defined in such a way, that every band contributes equally in speech perception. The signal/noise ratio is calculated within every band. Articulation index is supposed to be equal the weighted total of the band values. The articulation index does not take into account the properties of hearing and speech production, although it directs toward speech signal.

•   SII (Speech Intelligibility Index) is the evolution of AI method. The American Standard ANSI S3.5-1997 includes the speech intelligibility index. It provides for 4 measuring procedures on different band groups: 21 critical bands, 18 one third-octave bands, 17 equal by their contribution critical bands and 6 octave bands. The signal/noise ratio is calculated within every band and the total SII coefficient, ranged from 0 to 1 is computed. The speech intelligibility index takes into account only the properties of hearing, not speech production.

•   STI (Speech Transmission Index). We may approximately consider speech signal as broadband signal modulated by low-frequency signal. Articulation speed determines modulation frequency. When modulation depth decreases, speech signal becomes similar to noise and its intelligibility decreases. Accordingly, intelligibility decrease can be estimated according to modulation depth decrease also. Whole speech range is divided into 7octave bands. An octave noise signal is the input. The test signal intensity distribution agrees with the distribution of speech signal intensities. The modulating signal frequencies vary from 0.5 to 12.5 Hz with one-third-octave interval (14 frequencies in all). The STI measuring method is stated in the International standard IEC 268-16.

•   RATSI/STIPA (Rapid Speech Transmission Index). The STI method needs a lot of measuring procedures and calculations. A simplified method was developed, which provides for measuring only in 2 bands with 5 modulation frequencies and reduces the number of measuring procedures and calculations. For good intelligibility RASTI values must be not less than 0.6.

Speech transmission index as well as rapid speech transmission index imitates speech production process by means of noise model, but to take into account the properties of speech production and hearing in such a way is far from optimum.

•   C50 (factor of clearness) determines sound clearness and clarity. It is computed as near echo/far echo ratio. The method is based on the fact, that echo reduces signal intelligibility. The near echo/far echo ratios in several frequency bands are calculated. They consider near echo (less than 33 ms) as useful signal and far echo (more than 33 ms) as disturbing signal. The factor of clearness takes into account only the one kind of the possible distortions and it is worth to apply it only as one of the speech quality estimations.

The need to develop new methods and to improve existing ones is caused by desire to bring together objective and subjective estimation of quality and to explicitly use in such systems our knowledge about hearing and speech production.

To use arbitrary or particularized signal as a source signal depends on the estimation purpose (speech intelligibility evaluation, sound reproduction quality, quality estimation of speech, transmitted through intercommunication channels, etc.) and allows increasing estimation objectivity.


Automatic Sound Signals Quality Estimation (2)

on top
E-mail at: info at sevana dot fi
Sevana Oy. IT Solutions and Services. © 2004 - 2010