# unsupervised thresholding of signal amplitudes: MLE?

6 views (last 30 days)
hxen on 29 Nov 2023
Commented: Star Strider on 6 Dec 2023
I am hoping someone with knowledge of unsupervised signal processing can advise how best to handle the following problem. I have a long time series signal whose noisy baseline fluctuates over time:.
From the image above, you can see that there are low and high amplitude components. The small amplitudes are a mixture or noise plus some very low-amplitude real background signal. The small amplitude events compose the majority of events. However, I am interested in extracing the time occurrences and ampltiudes of the larger amplitude events only.
One problem is the unstable baseline on top of which the large amiplitudes ride (top subplot, black trace). Gven the fluctuating baseline, what would be best to determine the height of large amplitudes is not obvious. Relatedly, what method would be appropriate for this problem to unbiasedly determine a threshold to detect the lage ampltudes?
An approach I have tried is to first obtain the minimums of the troughs and use these values as query points (red circles in blown up image below) for fitting an interpolated line as a moving base (red dashed line). Substracting the moving base from the raw signal gives a flatten baseline (bottom subplot, blue trace):
From the flatten signal, I use a findpeaks and threshold for peak detection for anything above 3 medians. This has worked well as the majority of events are small ampltudes and three medians is selecting out the large peaks well (red and black ticks). With this methods, I can extract the time points of the large peak events and their amplitudes (as the difference of the moving basline from the raw signal at the peak points). However this is not unbiased of course and for other recordings this approach can be less effective.
To have a completely unbiased analysis, I am entertaining using an MLE approach by first using findpeaks (or peakfinder) to get the amplitudes of all the peak amplitudes (small and large) and then using this to get an unbiased threshold for selecting only the large amplitudes. Here's a histogram of the peak amplitudes:
While you can (arguably) discern two overlapping histograms (one centered around 0.5 and the other at around 2), I was hoping what was intuitively a bimodal distribution would have been more apparent. Nonetheless is there a way to use MLE to get an amplitude value at the intersection of these two overlaps corresponding to the more frequent small-amplitude and less frequent larger-amplitude events? Or is there perhaps a better way to approach this problem.
A related question to be asked is if the interpolated moving baseline subtraction is the best way to handle the noisy nature of the signal. But since there are portions that go below starting "zeroed" reference level, it is not at all clear how else to handle this fluctuating baseline.
Much apprecation and thanks for all who have read through this and can give advise/help on running this analysis in an unbiased and well-justified manner.
Cheers.
P. S. If it helps, the ampltude data in the histogram above has been included as well.
Mathieu NOE on 1 Dec 2023
sorry , I forgot to provide it
here in attachment
hxen on 1 Dec 2023
Thanks Mathieu!

Star Strider on 30 Nov 2023
I encourage you to investigate the ‘prominence’ values of the peaks. This is not a straightforward concept to define (see the findpeaks documentation for a discussion) however it generally takes into account the ‘context’ of each peak to define how it relates to the signal values around it. This does not require any significant smoothing or data pre-processing, and it may be more useful than other peak metrics. (I generally define peaks by setting a 'MinPeakProminence' value in findpeaks calls.) I did a brief analysis of your signal with respect to the peak values and prominence values.
A ‘prominence threshold’ might do what you want —
tpb = LD.tpb;
L = numel(tpb);
x = linspace(0, L-1, L).';
figure
plot(x, tpb)
grid
xlabel('Index')
ylabel('Signal Amplitude')
[pks,locs,wdth,prom] = findpeaks(tpb);
[pkcounts,pkedges,pkbins] = histcounts(pks, 50);
figure
bar(pkedges(1:end-1)+mean(diff(pkedges))/2, pkcounts)
xlabel('Peak Amplitudes')
ylabel('Counts')
[prcounts,predges,prbins] = histcounts(prom, 50);
figure
bar(predges(1:end-1)+mean(diff(predges))/2, prcounts)
xlabel('Prominence Amplitudes')
ylabel('Counts')
pkpr = sortrows([pks,prom],1);
figure
plot(pkpr(:,1), pkpr(:,2))
grid
xlabel('Peak Amplitude')
ylabel('Prominence Value')
promThrshld = 3;
[pks2,locs2] = findpeaks(tpb, 'MinPeakProminence',promThrshld);
figure
plot(x, tpb, 'DisplayName','Data')
hold on
plot(x(locs2), pks2, 'sr', 'DisplayName','Peaks with Prominence Values >= 3')
hold off
grid
xlabel('Index')
ylabel('Signal Amplitude')
legend('Location','best')
.
hxen on 6 Dec 2023
"I thought I saw a Comment from you similar to the last one you posted here as well, however when I went back to look for it, I was not able to find it. "
That is what I recalled too. Weird. Perhaps I accidentally delted somehow as had the thread going on two comps. Anyway, thank you so much Star! I am accepting this as the prctile feature of findpeaks is good. Thank you so much for your very helpful feedback.
Star Strider on 6 Dec 2023
As always, my pleasure!