Deep Pitch Estimator
Estimate pitch with CREPE deep learning neural network
Audio Toolbox / Deep Learning
The Deep Pitch Estimator block uses a CREPE pretrained neural network to estimate the pitch from audio signals. The block combines necessary audio preprocessing, network inference, and postprocessing of network output to return pitch estimations in Hz. This block requires Deep Learning Toolbox™.
Estimate Pitch Using Deep Pitch Estimator Block
This example shows how to use the Deep Pitch Estimator block to estimate the pitch of an audio signal in Simulink®. See Estimate Pitch Using CREPE Blocks for an example that uses the CREPE Preprocess, CREPE, and CREPE Postprocess blocks to perform the same task.
Adjust the block parameters to speed up computation and see the pitch estimations in real time as the audio plays.
Set the Overlap percentage (%) parameter to
50. With a lower overlap percentage, the block computes and outputs pitch estimations less frequently.
Set the Number of buffered pitch estimations parameter to
5. A higher value for this parameter allows the block to improve computational efficiency by operating on multiple audio frames in parallel. However, a higher value also increases latency because the block returns pitch estimations in batches instead of one at a time.
Set the Model capacity parameter to
Large. This model has fewer parameters than the full-size model, leading to faster computation at the cost of slightly lower accuracy.
Run the model to listen to a singing voice and view the estimated pitch in real time.
Port_1 — Audio input
Audio input, specified as a one-channel signal (vector). If Sample rate of input signal (Hz) is 16e3, there are no restrictions on the input frame length. If Sample rate of input signal (Hz) is different from 16e3, then the input frame length must be a multiple of the decimation factor of the resampling operation that the block performs. If the input frame length does not satisfy this condition, the block generates an error message with information on the decimation factor.
Sample rate of input signal (Hz) — Sample rate of input signal in Hz
16e3 (default) | positive scalar
Sample rate of the input signal in Hz, specified as a positive scalar.
Overlap percentage (%) — Overlap percentage between consecutive frames
85 (default) | [0, 100)
Specify the overlap percentage between consecutive input frames as a scalar in the range [0, 100).
Number of buffered pitch estimations — Number of pitch estimations in output
1 (default) | positive integer
Number of pitch estimations in output, specified as a positive integer.
A higher value allows the block to improve computational efficiency by operating on multiple audio frames in parallel. However, it also increases latency because the block buffers the specified number of pitch estimations before returning them.
Confidence threshold — Pitch confidence threshold
0.5 (default) | scalar in the range [0, 1)
Pitch confidence threshold, specified as a scalar in the range [0, 1). In postprocessing, the block suppresses fundamental frequencies where the network confidence is below the threshold.
If the maximum value of the network output is less than the
confidence threshold, the block returns
Model capacity — Size of trained neural network
Full (default) |
Model capacity, specified as
The smaller sizes correspond to fewer parameters in the model, leading to
faster computation but lower accuracy.
 Kim, Jong Wook, Justin Salamon, Peter Li, and Juan Pablo Bello. “Crepe: A Convolutional Representation for Pitch Estimation.” In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 161–65. Calgary, AB: IEEE, 2018. https://doi.org/10.1109/ICASSP.2018.8461329.
C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.
Usage notes and limitations:
To generate generic C code that does not depend on third-party libraries, in the Configuration Parameters > Code Generation general category, set the Language parameter to
To generate C++ code, in the Configuration Parameters > Code Generation general category, set the Language parameter to
C++. To specify the target library for code generation, in the Code Generation > Interface category, set the Target Library parameter. Setting this parameter to
Nonegenerates generic C++ code that does not depend on third-party libraries.
For ERT-based targets, the Support: variable-size signals parameter in the Code Generation> Interface pane must be enabled.
For a list of networks and layers supported for code generation, see Networks and Layers Supported for Code Generation (MATLAB Coder).
Introduced in R2023a