Selection of the Reference Stimuli One of biggest advantages that come with perceptual measurement techniques is that real voice and music signals will be the reference stimuli for the test. In other words, you will test the performance of the device under test under 'real life' conditions, and not with any artificial test signals, like sinusoidal tones or noise.

> So what Voice or Music signals ensure suitable tests?

As a rule of thumb, the reference file should be a signal that comes as close as possible to the kind of signal which shall be applied to the device under test in real life. E.g., if you design a special headset for female call center agents, you should use a test stimulus that contains mostly female speech. If the device should be used by male and female users as well as children, you should perform separate tests with typical stimuli for each of these cases.

For the assessment of MPEG audio codecs that are used for the transmission of high quality music between broadcast studios, real music should be used.

> Worst Case Items In the course of international standardization, thousands of test sequences must have been used, most of them did not really prove to be worst case items. For some signals, this is however the case, as they particularly expose a certain type of artefact, when passed through non-linear devices, such as codecs. As this critical character varies, you should include a representative set of reference signals in your test. Especially with wide band music codecs a variety of at least six to ten different test samples should be selected, since the performance of audio codecs differs widely depending on the test material.

> Duration of Signals The duration of the test sequence should be within the range of approximately four to eight seconds. Longer tests will lead to averaging effects (short distortions may be averaged down by a long but almost perfect transmission) and shorter sequences may not be long enough to contain representative parts of the signal. If for any reason very long reference files are desired, OPERA's feature of measuring just a short sequence out of the entire input signals could be of help. Details regarding this feature will you find in the OPERA user manual under chapter 4.

> Sample Rates The sample rate of the reference file is frequently already defined by the algorithm that shall be used for the evaluation of the recorded data. PEAQ according to ITU-R BS.1387 for example requires 48kHz sample rate, although the implementation in OPERA will deliver reliable results at 44.1kHz, too. Most speech quality measures are defined for 8 and 16kHz sample rate only. For more details, refer to the description of the individual algorithms or the standard documents that apply.

> Sample Format

The selection of the sample format should mainly be driven by considering the capabilities of the underlying hardware. While using e.g. the audio interfaces provided by OPERA, it makes sense to select '16bit linear'. Since currently all measures use 16bit linear internally, any higher resolution, although supported by the hardware, will not result in more accurate measurements. When performing test calls with the OPERA voice board, the sample format should be 8bit u-law or 8bit A-law (G.711). Otherwise the measurement will include at least one more step of encoding, since the DSP on the voice board will convert all input data back to G.711.

A set of typical wide band audio examples is recommended in the ITU-R recommendation BS.1387 and is part of the delivery of OPERA products. Suitable multi-lingual speech samples are also provided with OPERA, and some are available through the ITU-T, in the Series P Supplement 23, too.