Main Content


Extract audio features

Since R2019b



features = extract(aFE,audioIn) returns an array containing features of the audio input.


collapse all

Read in an audio signal.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");

Create an audioFeatureExtractor to extract the centroid of the Bark spectrum, the kurtosis of the Bark spectrum, and the pitch of an audio signal.

aFE = audioFeatureExtractor("SampleRate",fs, ...
    "SpectralDescriptorInput","barkSpectrum", ...
    "spectralCentroid",true, ...
    "spectralKurtosis",true, ...
aFE = 
  audioFeatureExtractor with properties:

                     Window: [1024x1 double]
              OverlapLength: 512
                 SampleRate: 44100
                  FFTLength: []
    SpectralDescriptorInput: 'barkSpectrum'
        FeatureVectorLength: 3

   Enabled Features
     spectralCentroid, spectralKurtosis, pitch

   Disabled Features
     linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta
     mfccDeltaDelta, gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease
     spectralEntropy, spectralFlatness, spectralFlux, spectralRolloffPoint, spectralSkewness, spectralSlope
     spectralSpread, harmonicRatio, zerocrossrate, shortTimeEnergy

   To extract a feature, set the corresponding property to true.
   For example, obj.mfcc = true, adds mfcc to the list of enabled features.

Call extract to extract the features from the audio signal. Normalize the features by their mean and standard deviation.

features = extract(aFE,audioIn);
features = (features - mean(features,1))./std(features,[],1);

Plot the normalized features over time.

idx = info(aFE);
duration = size(audioIn,1)/fs;

t = linspace(0,duration,size(audioIn,1));

t = linspace(0,duration,size(features,1));
plot(t,features(:,idx.spectralCentroid), ...
     t,features(:,idx.spectralKurtosis), ...
legend("Spectral Centroid","Spectral Kurtosis", "Pitch")
xlabel("Time (s)")

Figure contains 2 axes objects. Axes object 1 contains an object of type line. Axes object 2 with xlabel Time (s) contains 3 objects of type line. These objects represent Spectral Centroid, Spectral Kurtosis, Pitch.

Input Arguments

collapse all

Input audio, specified as a column vector or matrix of independent channels (columns).

Data Types: single | double

Output Arguments

collapse all

Extracted audio features, returned as an L-by-M-by-N array, where:

  • L –– Number of feature vectors (hops)

  • M –– Number of features extracted per analysis window

  • N –– Number of channels

Data Types: single | double

Version History

Introduced in R2019b