Short-time objective intelligibility measure

Since R2024a

Syntax

``metric = stoi(processed,reference,fs)``

Description

````metric = stoi(processed,reference,fs)` returns the short-time objective intelligibility (STOI) measurement. STOI is a speech intelligibility metric that compares the processed speech signal with a clean reference signal.```

Examples

Read in an audio file containing a clean speech signal. Add pink noise to create a noisy speech signal.

```[cleanSpeech,fs] = audioread("Rainbow-16-8-mono-114secs.wav"); noisySpeech = cleanSpeech + 0.1*pinknoise(size(cleanSpeech));```

Use `stoi` to measure the intelligibility of the noisy speech signal with the clean speech as the reference signal.

`metric = stoi(noisySpeech,cleanSpeech,fs)`
```metric = 0.9811 ```

Recreate the noisy speech signal with more pink noise and measure the intelligibility. See how the noisier signal has lower intelligibility according to the STOI metric.

```noisySpeech = cleanSpeech + 3*pinknoise(size(cleanSpeech)); metric = stoi(noisySpeech,cleanSpeech,fs)```
```metric = 0.5943 ```

Read in an audio file containing speech and noise. Also read in an audio file containing the original clean speech to use as a reference signal.

```[noisySpeech,fs] = audioread("NoisySpeech-16-mono-3secs.ogg"); reference = audioread("CleanSpeech-16-mono-3secs.ogg");```

Calculate the STOI metric for the noisy speech signal using `stoi`.

`noisySpeechSTOI = stoi(noisySpeech,reference,fs)`
```noisySpeechSTOI = 0.8370 ```

Use `enhanceSpeech` to enhance the speech signal. Evaluate the enhanced signal using the STOI metric and see the improvement compared to the STOI of the noisy signal.

```enhancedSpeech = enhanceSpeech(noisySpeech,fs); enhancedSpeechSTOI = stoi(enhancedSpeech,reference,fs)```
```enhancedSpeechSTOI = single 0.8808 ```

Input Arguments

Processed speech signal, specified as a column vector (single channel) with the same size as `reference`. STOI measures the intelligibility of this processed signal.

Data Types: `single` | `double`

Reference speech signal, specified as a column vector (single channel) with the same size as `processed`. The STOI metric compares the processed signal with this reference signal to measure the intelligibility.

Data Types: `single` | `double`

Sample rate of both the processed and reference signals in Hz, specified as a positive scalar.

Data Types: `single` | `double`

Output Arguments

STOI metric, returned as a scalar in the range [-1,1]. STOI measures the intelligibility of the processed input signal by comparing it with the clean reference signal. A higher value for the metric corresponds to a more intelligible speech signal.

References

[1] C. H. Taal, R. C. Hendriks, R. Heusdens and J. Jensen, "A short-time objective intelligibility measure for time-frequency weighted noisy speech," 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 2010, pp. 4214-4217, doi: 10.1109/ICASSP.2010.5495701.

[2] C. H. Taal, R. C. Hendriks, R. Heusdens and J. Jensen, "An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125-2136, Sept. 2011, doi: 10.1109/TASL.2011.2114881.

Version History

Introduced in R2024a