How to filter breath noise in audio？

Question

wei sun on 12 Jul 2022

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/1757870-how-to-filter-breath-noise-in-audio

Commented: Mathieu NOE on 15 Jul 2022

audio.zip

In the attachment are the original audio files and the MATLAB filter files used. I tried low-pass filtering and band-pass filtering. The effect is not obvious. This noise is mainly heavy breathing sound. How can I filter this breathing sound and save the speaking sound completely (Chinese or English)?

5 Comments
Show 3 older commentsHide 3 older comments

Jonas on 13 Jul 2022

do you want to remove it only in this sound or do you want to do this automatically for multiple files?

wei sun on 13 Jul 2022

remove or attenuate this noise.

Sign in to comment.

Sign in to answer this question.

Answer 1

Mathieu NOE on 13 Jul 2022

1
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/1757870-how-to-filter-breath-noise-in-audio#answer_1006045

Edited: Mathieu NOE on 13 Jul 2022

hello

i opted for a strategy based on the spectrogram content. I noticed that the "breathing" sections are characterized by a strong spectrogram output below 100 Hz (red dots) which is not the case for the "speaking" sections

I worked on channel 1 as channel 2 is clipped (distorded)

so I simply reduced the volume (here - 30 dB) for the segments that goes from the local minima just before and after each red dot

(you can also put directly zero if you prefer - see options in the code)

   
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% options 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% spectrogram dB scale
spectrogram_dB_scale = 80;  % dB range scale (means , the lowest displayed level is XX dB below the max level)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% load signal
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[signal,Fs] = audioread('original.wav');
dt = 1/Fs;
[samples,channels] = size(signal);
% select channel (if needed)
channels = 1;
signal = signal(:,channels);
signal_filtered = signal;
% time vector 
time = (0:samples-1)*dt;
%% decimate (if needed)
% NB : decim = 1 will do nothing (output = input)
decim = 40;
if decim>1
    signal_decim = decimate(signal,decim);
    Fs_decim = Fs/decim;
end
samples_decim = length(signal_decim);
time_decim = (0:samples_decim-1)*1/Fs_decim;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% FFT parameters
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
NFFT = 512;    % 
OVERLAP = 0.75;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% display : time / frequency analysis : spectrogram 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    [sg,fsg,tsg] = specgram(signal_decim,NFFT,Fs_decim,hanning(NFFT),floor(NFFT*OVERLAP));  
    % FFT normalisation and conversion amplitude from linear to dB (peak)
    sg_dBpeak = 20*log10(abs(sg))+20*log10(2/length(fsg));     % NB : X=fft(x.*hanning(N))*4/N; % hanning only
     % saturation of the dB range : 
    min_disp_dB = round(max(max(sg_dBpeak))) - spectrogram_dB_scale;
    sg_dBpeak(sg_dBpeak<min_disp_dB) = min_disp_dB;
    % plots spectrogram
    figure(2);
    imagesc(tsg,fsg,sg_dBpeak);colormap('jet');
    axis('xy');colorbar('vert');grid on
    df = fsg(2)-fsg(1); % freq resolution 
    title(['Spectrogram / Fs = ' num2str(Fs) ' Hz / Delta f = ' num2str(df,3) ' Hz ']);
    xlabel('Time (s)');ylabel('Frequency (Hz)');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% extract SG (dB) values from 0 to 100 hz (loud level in this freq range is
% breath sound
ind = find(fsg<=100);
fsg_breath = fsg(ind);
sg_dB_breath = sg_dBpeak(ind,:);
    max_dB = max(sg_dB_breath,[],1);
    max_dB = max_dB-min(max_dB); % shift the dB values to positive values for good working islocalmax
    % select peaks above +25 dB and neighboring local mins
    % find local maxima
    [tf, P] = islocalmax(max_dB,'MinProminence',25);
    x_peak = tsg(tf);
    y_peak = max_dB(tf);
    % find local minima
    [tm, P] = islocalmin(max_dB);
    x_min = tsg(tm);
    y_min  = max_dB(tm);
    figure(3);plot(tsg,max_dB,x_peak,y_peak,'dr',x_min,y_min,'dk');
    title('Spectrogram max dB value vs Time');
    xlabel('Time (s)');ylabel('Max dB value');
    
    % set to zero the data that are defined by the local mins just before
    % and after the high peaks
    
    for ck = 1:numel(x_peak)
        % search x_min just before 
        dist = x_min - x_peak(ck);
        ind_bef = find(dist<0,1,'last');
        x_min_bef = x_min(ind_bef);
        ind_aft = find(dist>0,1,'first');
        x_min_aft = x_min(ind_aft);   
        
        % now zero time signal between these two time indexes 
        ind = find(time>=x_min_bef & time<=x_min_aft);
        % signal_filtered(ind) = 0;  % option 1 : zero 
        signal_filtered(ind) = signal_filtered(ind)/30 ;  % option 2 :  30 dB attenuation
    end
    
    
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% display : time domain plot
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
figure(1),
subplot(2,1,1),plot(time,signal,'b');grid on
title(['Time plot  / Fs = ' num2str(Fs) ' Hz / raw data ']);
xlabel('Time (s)');ylabel('Amplitude');
subplot(2,1,2),plot(time,signal_filtered,'b');grid on
title(['Time plot  / Fs = ' num2str(Fs) ' Hz / filtered data ']);
xlabel('Time (s)');ylabel('Amplitude');
    
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% export signal
audiowrite('filtered.wav',signal_filtered,Fs); % audiowrite(filename,y,Fs,varargin)

8 Comments
Show 6 older commentsHide 6 older comments

Mathieu NOE on 13 Jul 2022

ok so this is a new attempt for the phone (first) case

again I tested first channel only

here , the logic is inversed , as the speaker voice segment contain smore energy below 400 Hz compared to breathing sound

also as the speaker start right away, I padded some random noise first to let the code detect the first voice segment

hope it helps !

clc
clearvars
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% FFT parameters
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
NFFT = 512;    % 
OVERLAP = 0.75;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% options 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% spectrogram dB scale
spectrogram_dB_scale = 80;  % dB range scale (means , the lowest displayed level is XX dB below the max level)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% load signal
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[signal,Fs] = audioread('phone.wav');
dt = 1/Fs;
[samples,channels] = size(signal);
% select channel (if needed)
channels = 1;
signal = signal(:,channels);
signal = signal(:); % make sure it's a col vector
% pad some dummy noise at the beginning to make spectrogram nicer (and get
% the first peak of fft data)
signal = [0.01*randn(100*NFFT,1);signal];
samples = numel(signal);
signal_filtered = zeros(size(signal));
% time vector 
time = (0:samples-1)*dt;
%% decimate (if needed)
% NB : decim = 1 will do nothing (output = input)
decim = 40;
if decim>1
    signal_decim = decimate(signal,decim);
    Fs_decim = Fs/decim;
elseif decim ==1 
    signal_decim = signal;
    Fs_decim = Fs;
end
samples_decim = length(signal_decim);
time_decim = (0:samples_decim-1)*1/Fs_decim;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% display 3 : time / frequency analysis : spectrogram demo
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    [sg,fsg,tsg] = specgram(signal_decim,NFFT,Fs_decim,hanning(NFFT),floor(NFFT*OVERLAP));  
    % NB specgram time is offset so we must compensate for that 
    tsg = tsg + NFFT/(2*Fs_decim);
    
    % FFT normalisation and conversion amplitude from linear to dB (peak)
    sg_dBpeak = 20*log10(abs(sg))+20*log10(2/length(fsg));     % NB : X=fft(x.*hanning(N))*4/N; % hanning only
     % saturation of the dB range : 
    min_disp_dB = round(max(max(sg_dBpeak))) - spectrogram_dB_scale;
    sg_dBpeak(sg_dBpeak<min_disp_dB) = min_disp_dB;
    % plots spectrogram
    figure(2);
    subplot(2,1,1),plot(time,signal,'b');grid on
    title(['Time plot  / Fs = ' num2str(Fs) ' Hz / raw data ']);
    xlabel('Time (s)');ylabel('Amplitude');
    xlim([min(time) max(time)]);
   subplot(2,1,2),imagesc(tsg,fsg,sg_dBpeak);colormap('jet');grid on
    xlim([min(time) max(time)]);
    axis('xy');
    df = fsg(2)-fsg(1); % freq resolution 
    title(['Spectrogram / Fs = ' num2str(Fs) ' Hz / Delta f = ' num2str(df,3) ' Hz ']);
    xlabel('Time (s)');ylabel('Frequency (Hz)');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% extract SG (dB) values from 0 to 400 hz (loud level in this freq range is
% speaker sound
ind = find(fsg<=400);
fsg_speak = fsg(ind);
sg_dB_speak = sg_dBpeak(ind,:);
    max_dB = max(sg_dB_speak,[],1);
    max_dB = max_dB-min(max_dB); % shift the dB values to positive values for good working islocalmax
    % select peaks above +25 dB
    % find local maxima
    [tf, P] = islocalmax(max_dB);
    x_peak = tsg(tf);
    y_peak = max_dB(tf);
    ii = (y_peak>25);
    x_peak = x_peak(ii);
    y_peak = y_peak(ii);    
    % find local minima
    [tm, P] = islocalmin(max_dB);
    x_min = tsg(tm);
    y_min  = max_dB(tm);
    figure(3);plot(tsg,max_dB,x_peak,y_peak,'dr',x_min,y_min,'dk');
    title('Spectrogram max dB value vs Time');
    xlabel('Time (s)');ylabel('Max dB value');
    
    % KEEP the data that are defined by the local mins just before
    % and after the high peaks
    for ck = 1:numel(x_peak)
        % search x_min just before 
        dist = x_min - x_peak(ck);
        ind_bef = find(dist<0,1,'last');
        x_min_bef = x_min(ind_bef);
        ind_aft = find(dist>0,1,'first');
        x_min_aft = x_min(ind_aft);   
        
        % now zero time signal between these two time indexes 
        ind = find(time>=x_min_bef & time<=x_min_aft);
        signal_filtered(ind) = signal(ind) ;  % keeep that portion of signal 
    end
    
    
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% display 1 : time domain plot
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
figure(1),
subplot(2,1,1),plot(time,signal,'b');grid on
title(['Time plot  / Fs = ' num2str(Fs) ' Hz / raw data ']);
xlabel('Time (s)');ylabel('Amplitude');
subplot(2,1,2),plot(time,signal_filtered,'b');grid on
title(['Time plot  / Fs = ' num2str(Fs) ' Hz / filtered data ']);
xlabel('Time (s)');ylabel('Amplitude');
    
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% export signal
audiowrite('phone_filtered.wav',signal_filtered,Fs); % audiowrite(filename,y,Fs,varargin)

Mathieu NOE on 15 Jul 2022

hello Wei

1/ as I said : in the "phone" wav file , there is no silent or breath sound before the speaker start to speak, so the spectrogram would have right away some energy in the speaker frequency range (used for detection later in the code). When I compute the max of the spectrogram in the low frequency range (=> max_dB) we would not have a first minima followed by a increase , then a peak => so islocalmax would not detect that first peak. If I don't padd this random noise we start with a high value then a decrease so we loose that first voice segment.

2/ the decimated data is used only for the spectrogram computation and detection of the time blocks to be kept (speaker). The algorithm that says if we have speakr sound or something else is based on low frequency threshold (400 Hz) so it's computationnaly more fficient to decimate the data and make shorter fft spectrograms rather than keeping the original sampling rate , making long fft computations and using only the very low end data only.

The data which is filtered is the original sampling freq (that's why we have two data sets, the original sampling rate data (signal) and the decimated one (signal_decim)

The output wav file has therefore same sampling rate as original file so no distorsion. You can hear the voice segments are not latered by the code.

wei sun on 15 Jul 2022

Ok thank you, I have been taught, the FFT of the entire segment does take up a lot of computing power, and it will introduce a lot of invalid information。

Mathieu NOE on 15 Jul 2022

the saving in computation is proportionnal to the applied decimation factor (here 40) so I don't think it's negelctable especcially if you want to apply the code to longer wav files

but of course you can remove the decimation operation if you feel bad about it

Sign in to comment.

How to filter breath noise in audio？

5 Comments
Show 3 older commentsHide 3 older comments

Accepted Answer

8 Comments
Show 6 older commentsHide 6 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to filter breath noise in audio？

5 Comments Show 3 older commentsHide 3 older comments

Accepted Answer

8 Comments Show 6 older commentsHide 6 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

5 Comments
Show 3 older commentsHide 3 older comments

8 Comments
Show 6 older commentsHide 6 older comments