4 views (last 30 days)

Show older comments

I have a sequence of data points (see image below) in which a section has undergone an unknown corrupting transformation, resulting in jump discontinuities with respect to the rest of the plot. The locations of the jump discontinuitites are known and the corrupted data samples span an interval of about 15 points. I would like to undo this effect and restore continuity by applying an approximate corrective transform to the corrupted data samples. In other words, if Y is the given vector of data samples and I is a logcal index of the corrupted interval, I would like a continuous function P() such that P(Y(I)) uncorrupts the bad samples resulting in a discontinuity-free curve. I believe a low order polynomial (degree <=3) transform would be a good model for the required correction, P(). I am looking for suggestions as to how to go about efficienctly estimating the polynomial coefficients (I have many such curves to process).

If there is some standard signal processing tool in Matlab that routinely does this or something similar (this seems to me like it might be a very standard problem) that would be ideal. Otherwise, I am wondering what estimation cost function might be a sensible choice to minimize for the purposes of estimating the polynomial coefficients. Minimizing the sum of the jump discontinuity magnitudes seems like a naive choice, since this cost function relies on just two data points, which are influenced by noise. Ideally, one would define some continuity metric that takes into account the trends in the data in a somewhat larger neighborhood of the jumps. Any ideas?

John D'Errico
on 9 Aug 2021

Edited: John D'Errico
on 9 Aug 2021

Honestly, I don't see the problem, at least if the discontinuity is as large as you show it. That will make it clear where the breaks live. So then your model will be just a a base curve, plus a couple of Heaviside functions with estimated coefficients. I'll make up some data...

X = sort(rand(30,1))*1.5;

Bloc = [0.5, 1.2];

Bval = [-.5 0.75];

H = @(X,origin) +(X >= origin);

Y = sin(X) + H(X,Bloc(1))*Bval(1) + H(X,Bloc(2))*Bval(2) + randn(size(X))/40;

plot(X,Y,'o')

So a curve with two breaks in it. We can never know where the break occurs, except that it MUST occur between two points. So just search for the two largest jumps in value between any pair of points. Then we can assume the breaks happened at the mid point between the located points. Since I've sorted the X values, we can just use diff.

[~,Jind] = maxk(abs(diff(Y)),2);

Blocest = sort((X(Jind) + X(Jind+1))/2)

So not too bad. Again, we cannot know exactly where that break really occurs, but we can assume it falls midway between the indicated points.

But now we can estimate a model. For example, a cubic polynomial will probably be adequate for a curve this simple.

polydeg = 3;

A = [X(:).^(polydeg:-1:0),H(X(:),Blocest(1)),H(X(:),Blocest(2))];

coef = A\Y(:)

So a cubic polynomial, plus the two discontinuity terms. See it did a pretty good job of estimating the jumps.

Ypred = polyval(coef(1:end-2),X) + coef(end-1)*H(X,Blocest(1)) + coef(end)*H(X,Blocest(2));

plot(X,Y,'o',X,Ypred,'-r')

xline(Blocest)

It seems ok to me.

The curve, with breaks removed, is just:

Ycorrected = Y - coef(end-1)*H(X,Blocest(1)) - coef(end)*H(X,Blocest(2));

plot(X,polyval(coef(1:end-2),X),'r-',X,Ycorrected,'bo')

xline(Blocest)

If your data is far more noisy than you show, or the discontinuities are far smaller, then you may have a problem. But then you may just be better off fitting a curve with perhaps only one jump discontinuity. If you can't see a jump, then is it a jump at all? You might use a tolerance to decide how significant the jumps must be for it to recognized as such.

Finally, if the width of the jogged segment is known in advance, then you can even get trickier about how you find the locations of the breaks.

Jeff Miller
on 10 Aug 2021

Image Analyst
on 10 Aug 2021

Hi Matt. You may have it already figured out by now, but this is how I'd do it.

- First smooth the signal with a huge window at least twice as big as the expected bad section.
- Then find the difference between the smoothed signal and the actual data.
- Look for sections where the difference is big and identify the bad section.
- Do a linear interpolation to bridge the gap across the bad section, then update the difference signal with that new interpolated signal.
- Look at the mean of the difference signal in the bad section and assume that all bad data is to get lifted by the same amount.
- Lift the bad data by the average lift amount.

Full demo below:

clc; % Clear the command window.

close all; % Close all figures (except those of imtool.)

clear; % Erase all existing variables. Or clearvars if you want.

workspace; % Make sure the workspace panel is showing.

format long g;

format compact;

fontSize = 15;

% Create uncorrupted data as a noisy sine wave.

x = linspace(pi/4,pi/2,60);

Y = sin(x) + randn(size(x))/80;

% plot(X, Y, 'b--s', 'LineWidth', 2);

% hold on

% Lower a chunk of it.

a=1.1; %unknown

b=-0.5; %unknown

I = 1 <= x & x <= 1.3; %KNOWN

Yc = Y;

Yc(I) = a*Y(I)+b; %corrupted signal

% Plot original signal.

subplot(2, 1, 1);

plot(x, Yc, '--s', 'LineWidth', 2);

grid on;

xlabel('x', 'FontSize', fontSize);

ylabel('y', 'FontSize', fontSize);

title('Original Signal', 'FontSize', fontSize);

% Now we have data and we can begin.

%======================================================

% Get a smoothed signal

windowSize = 31;

ym = movmax(Yc, windowSize);

% Plot it.

hold on;

plot(x, ym, '-', 'LineWidth', 2);

% Find the difference between smoothed and actual

diffSignal = abs(ym - Yc);

plot(x, diffSignal, '-', 'LineWidth', 2);

% The lowered signal is where the difference is more than 3

% Get the good data.

threshold = 0.3;

yline(threshold, 'Color', 'g', 'LineWidth', 3);

goodIndexes = diffSignal < threshold;

badIndexes = diffSignal >= threshold;

xGood = x(goodIndexes);

yGood = Yc(goodIndexes);

% Interpolate over the bad section.

ym = interp1(xGood, yGood, x);

% Plot the interpolated section

plot(x, ym, 'm-');

% Update the difference signal.

diffSignal = abs(ym - Yc);

% Get average difference over ths bad section

amountToLift = mean(diffSignal(badIndexes))

% List bad data.

yRepaired = Yc; % Initialize

yRepaired(badIndexes) = Yc(badIndexes) + amountToLift;

% Plot repaired signal.

subplot(2, 1, 2);

plot(x, yRepaired, 'g.-', 'LineWidth', 2, 'MarkerSize', 20);

grid on;

xlabel('x', 'FontSize', fontSize);

ylabel('y', 'FontSize', fontSize);

title('Repaired Signal', 'FontSize', fontSize);

g = gcf;

g.WindowState = 'maximized';

The initial smooth signal is in red. The interpolated, improced smooth signal is in magenta. The difference between the magenta line and the original blue data is in yellow. The repaired signal is in the bottom graph in green. Note how the data in the lifted section still has it's variations (it's not a smooth or straight line), it's just "lifted" up to where it should have been at the start.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!