VGGish

VGGish embeddings extraction network

Since R2022a

Libraries:
Audio Toolbox / Deep Learning

Description

The VGGish block leverages a pretrained convolutional neural network that is trained on the AudioSet data set to extract feature embeddings from audio signals.

Examples

Compare VGGish Embeddings Block with Equivalent VGGish Blocks

Show that VGGish Embeddings block is equivalent to the cascade of VGGish Preprocess block and VGGish block.

Open Model

Ports

Input

expand all

Port_1 — Mel spectrograms
96-by-64 matrix | 96-by-64-by-1-by-N array

Mel spectrograms, specified as a 96-by-64 matrix or a 96-by-64-by-1-by-N array, where:

96 –– Represents the number of 25 ms frames in each mel spectrogram
64 –– Represents the number of mel bands spanning 125 Hz to 7.5 kHz
N –– Represents the number of mel spectrograms.

You can use the VGGish Preprocess block to generate mel spectrograms. All spectrograms are of the dimension 96-by-64.

Data Types: single | double

Output

expand all

Port_1 — Embeddings
N-by-128 matrix

VGGish feature embeddings, returned as an N-by-128 matrix, where N is the number of mel spectrograms in the input. The feature embeddings are a compact representation of audio data.

Data Types: single

Parameters

expand all

Mini-batch size — Size of mini-batches
`128` (default) | positive integer

Size of mini-batches to use for prediction specified as a positive integer. Larger mini-batch sizes require more memory but can lead to faster predictions.

Block Characteristics

Data Types	`double` \| `single`
Direct Feedthrough	`no`
Multidimensional Signals	`no`
Variable-Size Signals	`no`
Zero-Crossing Detection	`no`

References

[1] Gemmeke, Jort F., Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. “Audio Set: An Ontology and Human-Labeled Dataset for Audio Events.” In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 776–80. New Orleans, LA: IEEE, 2017. https://doi.org/10.1109/ICASSP.2017.7952261.

[2] Hershey, Shawn, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, et al. “CNN Architectures for Large-Scale Audio Classification.” In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 131–35. New Orleans, LA: IEEE, 2017. https://doi.org/10.1109/ICASSP.2017.7952132.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Usage notes and limitations:

To generate generic C code that does not depend on third-party libraries, in the Configuration Parameters > Code Generation general category, set the Language parameter to C.
To generate C++ code, in the Configuration Parameters > Code Generation general category, set the Language parameter to C++. To specify the target library for code generation, in the Code Generation > Interface category, set the Target Library parameter. Setting this parameter to None generates generic C++ code that does not depend on third-party libraries.
For a list of networks and layers supported for code generation, see Networks and Layers Supported for Code Generation (MATLAB Coder).

Version History

Introduced in R2022a

VGGish

Description

Examples

Compare VGGish Embeddings Block with Equivalent VGGish Blocks

Ports

Input

Port_1 — Mel spectrograms
96-by-64 matrix | 96-by-64-by-1-by-N array

Output

Port_1 — Embeddings
N-by-128 matrix

Parameters

Mini-batch size — Size of mini-batches
`128` (default) | positive integer

Block Characteristics

References

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Version History

See Also

Apps

Blocks

Functions

VGGish

Description

Examples

Compare VGGish Embeddings Block with Equivalent VGGish Blocks

Ports

Input

Port_1 — Mel spectrograms 96-by-64 matrix | 96-by-64-by-1-by-N array

Output

Port_1 — Embeddings N-by-128 matrix

Parameters

Mini-batch size — Size of mini-batches 128 (default) | positive integer

Block Characteristics

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using Simulink® Coder™.

Version History

See Also

Apps

Blocks

Functions

Port_1 — Mel spectrograms
96-by-64 matrix | 96-by-64-by-1-by-N array

Port_1 — Embeddings
N-by-128 matrix

Mini-batch size — Size of mini-batches
`128` (default) | positive integer

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.