PSQM, PSQM+ The intrusive algorithm to calculate the Perceptual Speech Quality Measure (PSQM) was devised by Beerends in 1993 [BEER94]. This development of KPN Research represents an adapted version of the more general perceptual audio quality measure (PAQM) [BEER92], optimized for telephony speech signals. This is due to the observation that the psycho-acoustic effects known from masking experiments seem to differ in significance, when comparing the perception of speech and music signals. One reason might be that the human brain possibly recalls the reference sound of familiar voices more accurately from the daily life experience, compared to music sounds. Up to now, no single homogeneous approach has been presented that would allow for high correlation with both, speech, and music signals without adapting algorithm parameters [BEER95].

> Fundamentals of the PSQM Measurement Algorithm The detailed block diagram shows how to calculate PSQM. In the first step, the time domain representations of both input signals, x and y are transformed to the frequency domain. This transformation is accomplished by selecting blocks of the input samples that are input to an FFT. A Hann window is applied. The (linear) frequency scale is transformed to a pitch scale ("frequency warping"). The pitch modeling is also often referred to as "Bark transformation". Both, the reference, and the test signal are then filtered with the transfer characteristics of the receiving device (e.g. handset, loudspeaker, or headphones). A "Hoth noise" signal is added to simulate the background noise present in a typical office environment. The objective is to take into account the masking effects of real world environment noise, to properly model a masked threshold. The subsequent process of "intensity warping" leads to a representation of a compressed loudness as a function of pitch and time. By subtracting the two signal representations, an estimate of the audible error is derived. The difference signal is - of course - still a function of pitch and time.

> Block diagram of the PSQM Algorithm
click to [ enlarge picture ]

The following blocks are intended to represent the cognitive part of the modelling. The "asymmetry processing" should take into account that distortions, which were introduced by the device under test, are more easily perceived than signal components that were left out by the codec. Finally, the "silent interval weighting" will differ between silent and speech active intervals over the time. It is believed that this parameter allows a fitting of the cognitive processing to cultural differences. It was shown that almost identical subjective tests carried out at several locations in the world, and comprising different languages have led to different results, for instance in Europe, and Asia. It was concluded that the difference results from language differences, and the accompanied cultural differences. For example, a noisy floor may be more annoying if there are more silent intervals during a telephone conversation.

> Verification by the ITU The PSQM algorithm was one out of several proposals that were tested by Study Group 12 of ITU-T in 1995 for the purpose of international verification. Further proposals were the EPR Algorithm ("Expert Pattern Recognition"), which consisted of measures of the "LPC Cepstrum Function", "Information Index", and the "Coherence Function" (CHF). In a test series conducted by the Japanese phone corporation NTT, including listening tests in Japan and Italy, the highest correlation was achieved with PSQM results, when compared to the subjective tests. Consequently, PSQM was recommended by the ITU-T in 1996 for the objective quality measurement of telephone band speech codecs. Since then, PSQM has been used intensively for voice quality testing applications.

> Input Parameters to PSQM The PSQM algorithm is defined for sampling rates of 8 kHz and 16 kHz. PSQM always simulates a listening test (in the following referred to as a "virtual listening test"). To obtain results that highly correlate with those results that would have been obtained from subjects in a real listening test, PSQM must know some parameters of that virtual listening experiment. The following parameters must therefore be input to the algorithm:

The listening condition, indicating if the virtual experiment uses loudspeakers, headphones or typical telephone handsets for listening.
The level of the background masking noise that was present during the virtual experiment “Hoth noise”. Real life will always have background noise that produces masking effects. Even in silent environments this noise is in most cases higher than 30 dBA. This effect is modelled by adding background Hoth noise to the reference as well as to the test signal.
The level at which the signal is played to the subject in the virtual listening test.
The upper frequency, representing the upper frequency limit of the measurement.

> "Raw PSQM value" diagram of OPERA The PSQM value indicates the degree of subjective quality degradation as a result of speech coding. For this reason, when an estimation of subjective quality on a specific scale is not necessary, e.g. in optimizing parameters of a codec or in simply comparing the performance of codecs, the PSQM value itself is quite useful.

click to [ enlarge picture ]

In PSQM, the silent intervals are taken into account using a weighting factor that depends on the context of subjective experiments, i.e. the portion of silence intervals varies from one culture group to the other. In the diagram, three PSQM values are displayed that use different weighting factors. For European languages it is recommended to take the PSQM-W2 value into account.

> Output results of the PSQM Algorithm The OMOS value represents the Objective Mean Opinion Score, which is the PSQM result, mapped to the MOS scale. The OMOS+ is the corresponding PSQM+ result. OMOS-I is the PSQM result mapped to a full five point MOS scale, ranging from 1 to 5. Since subjective tests show an average between 4.05 and 4.5 for transparent quality (some listeners always hear some distortions...), the MOS scaling of PSQM ranges from 1.0 to 4.05 (and not up to 5.0). The only exception from this is the OMOS-I. Since this value represents the behavior of an individual, ideal listener, it covers the full range of the ITU scale (1..5).

click to [ enlarge picture ]

> Results calculated by the OPERA™ PSQM version, and their interpretation
Model Output Variable
Silence The percentage of silent intervals during a measurement
Sev. Distorted The percentage of severely distorted frames during measurement
Time Clipped The percentage of time clipped frames during measurement
OMOS-W0 MOS according to P.861, silence weight=0.0
OMOS-W2 MOS according to P.861, silence weight=0.2
OMOS-W4 MOS according to P.861, silence weight=0.4
OMOS+ MOS according to PSQM+
OMOS-I MOS of the individual listener

The OPERA results window summarizes the most important results at the end of the measurement.

click to [ enlarge picture ]

Common Mistakes

> There are clearly Audible Distortions, but PSQM scores around 5.0 When all of your measurement results show a MOS of 5.0, while at the same time clearly audible distortions exist, please check the following:

Are the correct files used?
Are the listening level and the upper frequency limit set up properly?
Check if the Delay Compensation is enabled and working properly. If not, e.g. OPERA/PSQM will discard all frames, for which it could not detect a reliable delay. In extreme cases this may result in almost all invalid frames and the default score of 5.0.

> I always measure a MOS near 1.0 If you always measure a MOS of 1.0, the most frequent reason is – despite the trivial solution of mixed up files – that either the Delay Compensation algorithm is not set up properly, or that the delay can not be compensated by the system due to its length or variation. Check the OPERA delay status in such a case. If it is very low (<40) then examine the time signals and verify that the waveforms of the reference and the test signal do indeed match. Assuming the delay is simply too long, you can try setting a static delay offset. Supposing it varies too much, the scope of P.861 is exceeded in any case and measurements should be performed with PESQ.

> Further Details... For more details on PSQM measurements, we recommend to refer to our Literature section, or take a look at OPERA under our Products section.