# Fit a transform to remove piecewise discontinuities

4 views (last 30 days)
Matt J on 9 Aug 2021
Edited: Matt J on 11 Aug 2021
I have a sequence of data points (see image below) in which a section has undergone an unknown corrupting transformation, resulting in jump discontinuities with respect to the rest of the plot. The locations of the jump discontinuitites are known and the corrupted data samples span an interval of about 15 points. I would like to undo this effect and restore continuity by applying an approximate corrective transform to the corrupted data samples. In other words, if Y is the given vector of data samples and I is a logcal index of the corrupted interval, I would like a continuous function P() such that P(Y(I)) uncorrupts the bad samples resulting in a discontinuity-free curve. I believe a low order polynomial (degree <=3) transform would be a good model for the required correction, P(). I am looking for suggestions as to how to go about efficienctly estimating the polynomial coefficients (I have many such curves to process). If there is some standard signal processing tool in Matlab that routinely does this or something similar (this seems to me like it might be a very standard problem) that would be ideal. Otherwise, I am wondering what estimation cost function might be a sensible choice to minimize for the purposes of estimating the polynomial coefficients. Minimizing the sum of the jump discontinuity magnitudes seems like a naive choice, since this cost function relies on just two data points, which are influenced by noise. Ideally, one would define some continuity metric that takes into account the trends in the data in a somewhat larger neighborhood of the jumps. Any ideas?
Sean de Wolski on 10 Aug 2021
I'll bet there's a clever way to do this with a wavelet. @Wayne King

John D'Errico on 9 Aug 2021
Edited: John D'Errico on 9 Aug 2021
Honestly, I don't see the problem, at least if the discontinuity is as large as you show it. That will make it clear where the breaks live. So then your model will be just a a base curve, plus a couple of Heaviside functions with estimated coefficients. I'll make up some data...
X = sort(rand(30,1))*1.5;
Bloc = [0.5, 1.2];
Bval = [-.5 0.75];
H = @(X,origin) +(X >= origin);
Y = sin(X) + H(X,Bloc(1))*Bval(1) + H(X,Bloc(2))*Bval(2) + randn(size(X))/40;
plot(X,Y,'o') So a curve with two breaks in it. We can never know where the break occurs, except that it MUST occur between two points. So just search for the two largest jumps in value between any pair of points. Then we can assume the breaks happened at the mid point between the located points. Since I've sorted the X values, we can just use diff.
[~,Jind] = maxk(abs(diff(Y)),2);
Blocest = sort((X(Jind) + X(Jind+1))/2)
Blocest = 2×1
0.5353 1.1808
So not too bad. Again, we cannot know exactly where that break really occurs, but we can assume it falls midway between the indicated points.
But now we can estimate a model. For example, a cubic polynomial will probably be adequate for a curve this simple.
polydeg = 3;
A = [X(:).^(polydeg:-1:0),H(X(:),Blocest(1)),H(X(:),Blocest(2))];
coef = A\Y(:)
coef = 6×1
-0.0543 -0.1810 1.0153 0.0174 -0.4612 0.7519
So a cubic polynomial, plus the two discontinuity terms. See it did a pretty good job of estimating the jumps.
Ypred = polyval(coef(1:end-2),X) + coef(end-1)*H(X,Blocest(1)) + coef(end)*H(X,Blocest(2));
plot(X,Y,'o',X,Ypred,'-r')
xline(Blocest) It seems ok to me.
The curve, with breaks removed, is just:
Ycorrected = Y - coef(end-1)*H(X,Blocest(1)) - coef(end)*H(X,Blocest(2));
plot(X,polyval(coef(1:end-2),X),'r-',X,Ycorrected,'bo')
xline(Blocest) If your data is far more noisy than you show, or the discontinuities are far smaller, then you may have a problem. But then you may just be better off fitting a curve with perhaps only one jump discontinuity. If you can't see a jump, then is it a jump at all? You might use a tolerance to decide how significant the jumps must be for it to recognized as such.
Finally, if the width of the jogged segment is known in advance, then you can even get trickier about how you find the locations of the breaks.
Jeff Miller on 10 Aug 2021
@Matt J Maybe the metric could be the sum of two RMSEs, each one based on the fit of a 2nd or 3rd order polynomial locally across a few points near one of the two discontinuity boundaries. If you don't know anything more than that the curve is continuous, I would think that points far from the discontinuities can't really tell you anything.

Image Analyst on 10 Aug 2021
Hi Matt. You may have it already figured out by now, but this is how I'd do it.
1. First smooth the signal with a huge window at least twice as big as the expected bad section.
2. Then find the difference between the smoothed signal and the actual data.
3. Look for sections where the difference is big and identify the bad section.
4. Do a linear interpolation to bridge the gap across the bad section, then update the difference signal with that new interpolated signal.
5. Look at the mean of the difference signal in the bad section and assume that all bad data is to get lifted by the same amount.
6. Lift the bad data by the average lift amount.
Full demo below:
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 15;
% Create uncorrupted data as a noisy sine wave.
x = linspace(pi/4,pi/2,60);
Y = sin(x) + randn(size(x))/80;
% plot(X, Y, 'b--s', 'LineWidth', 2);
% hold on
% Lower a chunk of it.
a=1.1; %unknown
b=-0.5; %unknown
I = 1 <= x & x <= 1.3; %KNOWN
Yc = Y;
Yc(I) = a*Y(I)+b; %corrupted signal
% Plot original signal.
subplot(2, 1, 1);
plot(x, Yc, '--s', 'LineWidth', 2);
grid on;
xlabel('x', 'FontSize', fontSize);
ylabel('y', 'FontSize', fontSize);
title('Original Signal', 'FontSize', fontSize);
% Now we have data and we can begin.
%======================================================
% Get a smoothed signal
windowSize = 31;
ym = movmax(Yc, windowSize);
% Plot it.
hold on;
plot(x, ym, '-', 'LineWidth', 2);
% Find the difference between smoothed and actual
diffSignal = abs(ym - Yc);
plot(x, diffSignal, '-', 'LineWidth', 2);
% The lowered signal is where the difference is more than 3
% Get the good data.
threshold = 0.3;
yline(threshold, 'Color', 'g', 'LineWidth', 3);
goodIndexes = diffSignal < threshold;
xGood = x(goodIndexes);
yGood = Yc(goodIndexes);
% Interpolate over the bad section.
ym = interp1(xGood, yGood, x);
% Plot the interpolated section
plot(x, ym, 'm-');
% Update the difference signal.
diffSignal = abs(ym - Yc);
% Get average difference over ths bad section
yRepaired = Yc; % Initialize
% Plot repaired signal.
subplot(2, 1, 2);
plot(x, yRepaired, 'g.-', 'LineWidth', 2, 'MarkerSize', 20);
grid on;
xlabel('x', 'FontSize', fontSize);
ylabel('y', 'FontSize', fontSize);
title('Repaired Signal', 'FontSize', fontSize);
g = gcf;
g.WindowState = 'maximized'; The initial smooth signal is in red. The interpolated, improced smooth signal is in magenta. The difference between the magenta line and the original blue data is in yellow. The repaired signal is in the bottom graph in green. Note how the data in the lifted section still has it's variations (it's not a smooth or straight line), it's just "lifted" up to where it should have been at the start.
Matt J on 11 Aug 2021
That would be equivalent to the approach I mentioned in my original post that I am looking to improve upon.
Actually, I take that back. It would be equivalent to an additive linear correction
Y_restored(I)=Yc(I)+c1*X(I)+c2
That is interesting, although there is still the issue with this method that it uses data at only 2 points to fit c1 and c2 (and so is sensitive to noise). Still, it is food for thought.