spline approximation with constraints

Question

0 votes

xydata.mat

Hey everyone. I have a dense set of points x,y that seem to fall on a single curve:

I would like to find a smooth function to mach the data. Specifically, I would like it to fulfill the following criteria:

-It is the closest to the data (say in a least squares sense)

-Its first and second derivatives vanishes only once (hopefully at the maximum and inflection points of the original data)

I have tried doing that using smoothing spline approximation (not sure if this is the correct approach here with such a dense dataset, I any of you have a better idea i would like to hear). So far my best results were obtained by manually picking ~50 points from the data, using:

[xi,yi] = ginput;

and fitting a spline to them, using

splinetools(xi,yi)

And playing with the parameter in the smoothing spline approximation method until the spline matches.

Obviously that's very crude and doesn't satisfy my criteria. How are things like that typically done? Is there a way to put constraints on the number of critical and inflection points?

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

John D'Errico on 4 Apr 2018

Edited: John D'Errico on 4 Apr 2018

0 votes

There are several problems in what you are trying to do.

The obvious one is that splines abhor singularities, and you have one on the left end of that curve. Made up of polynomial segments, you can expect them to fail there.

As well, a standard least squares spline, or a smoothing spline both assume the error is in y only. But you clearly have a cloud that spreads out along the curve in both directions. So any least squares or smoothing spline must perform poorly. So I would not even bother trying to use my own SLM toolbox on this data.

Orthogonal regressions using splines are difficult.

I see that you have supplied the data, so I'll take a look at it. Perhaps I can make a stab at something...

7 Comments
Show 5 older comments Hide 5 older comments

John D'Errico on 5 Apr 2018

Edited: John D'Errico on 5 Apr 2018

Open in MATLAB Online

Thinking a bit about this. I came up with several ideas that might work.

1. Use ginput to choose points along the curve. Connect them using a tool like my interparc (download from the file exchange), or you can use cscvn (from the curvefitting toolbox I think.) You cannot use a standard spline, due to that vertical region. Since you use ginput here, the curve will be as good as is your choice of those points.

2. Use kmeans to choose the points. Connect using interparc/cscvn. The problem here is the points will come out in no special order.

[IDX, C] = kmeans(xy, 50);
plot(x,y,'b.',C(:,1),C(:,2),'rs')

As you can see, these points do reasonably well, with almost no effort made on my part.

plot(C(:,1),C(:,2),'rx')

Again, the problem is they don't lie in any special order. So a spline will fail. My solution would be to connect the dots using a traveling salesman solver. I can find lots of them on the file exchange. As I recall, the tools by Joseph Kirk were pretty good, so here is one:

https://www.mathworks.com/matlabcentral/fileexchange/21196-open-traveling-salesman-problem-genetic-algorithm

So if I try it out, I get this:

USERCONFIG.xy = C;
tspo_ga(USERCONFIG)
res = tspo_ga(USERCONFIG)
res = 
  struct with fields:
               xy: [50×2 double]
             dmat: [50×50 double]
          popSize: 100
          numIter: 10000
         showProg: 1
       showResult: 1
      showWaitbar: 0
         optRoute: [2 34 48 12 47 33 10 38 6 21 17 41 4 35 49 22 18 31 5 36 19 24 11 44 42 25 8 37 39 23 30 14 27 9 43 29 16 7 15 28 46 13 32 1 50 20 40 3 45 26]
          minDist: 46.257
plot(C(res.optRoute,1),C(res.optRoute,2),'r-',C(:,1),C(:,2),'bs')

That actually did pretty well, and took virtually no thinking on my part to solve, doing so automatically. About 3 lines of code, excluding the plots?

I'm not positive that you need a spline interpolant on this. the 50 points I generated will probably do reasonably well, just using the connect the dots capability of plot.

3. A solution of more complexity might involve an iterative/adaptive approach. Thus, start with two points, one at each end. ginput would suffice here. For example, I picked two endpoints off your plot as:

xycurve = ginput
xycurve =
   -0.47235       4.8397
     20.611     -0.49563
xyci = interparc(100,xycurve(:,1),xycurve(:,2));
plot(x,y,'b.',xyci(:,1),xyci(:,2),'r-')

Now, I might use my distance2curve utility, also found on the file exchange. Find the point from the data set that lies farthest from that curve.

[~,xydist,t] = distance2curve(xycurve,xy,'linear');
[~,xydist,t] = distance2curve(xycurve,xy,'linear');
[~,ind] = max(xydist)
ind =
   700
xy(ind,:)
ans =
     -0.22797       20.257

So the farthest point from the curve was #700. Now, we might try to find a point that lies near there, but is centered inside the point cloud. Add that to the curve in the correct sequence, and repeat, automatically adding new points to the curve wherever it fails to model the data sufficiently well. This would take some more thinking on my part to implement, enough so that I'm not sure it is worth it.

To be honest, the second option was so easy to write and use, I'd go with it. If you want a smooth curve, then use either interparc or cscvn to interpolate the set. As far as finding a smooth function goes here, remember that the left end will be a singularity. Any true functional approximation, such as a spline, will be difficult to build.

John D'Errico on 11 Apr 2018

Edited: John D'Errico on 11 Apr 2018

Open in MATLAB Online

NO. Unique is NOT a solution. Unique just sorts the points, sorting them on increasing x. In fact, interp1 may not even be a good solution.

The problem is, those points on the vertical line may not be in order, even if you sort them on x. There is no assurance that they will be. kmeans will pick points from the cloud. But depending on the noise in that cloud, you should imagine that some of those points will vary back and forth in x. The noise might be large enough that along that line, IF you sort them in x, they will be out of order in y.

So why is that a problem? Interp1 will sort the points on x anyway. So you need not bother using unique, as that is a waste of time, since interp1 is doing an internal sort. But the points won't be in order.

Let me show you an example. I've generated a set of points such as you might find from the m-means procedure. See that the x axis is scaled VERY tightly.

So x varies by only a tiny amount. But it jitters and jogs back and forth somewhat randomly. This is what I would expect to see from your data, when k-means is applied. Here I've allowed MATLAB to autoscale the axes, so we can see what happens if you treat these points improperly.

Now, see what happens when you sort them on x. Again, that is what unique will do. It is what interp1 does anyway.

xint = linspace(0,0.12,1000);
plot(x,y,'b:o',xint,interp1(x,y,xint,'linear'),'r-')
grid on

Look carefully at the differnce between the two lines. The dotted clue line is the one that connects them in the proper order, thus, along there, for increasing y. The red line is what happens when a sort on x intervenes.

Again, all of this is due to the use of k-means on that cloud. It is also why I tried to tell you to use a traveling salesman solver to sort the points properly. The TSP solver will try to sort them in their natural order.

Finally, this is why simple use of interp1 can easily fail, because no matter what, if those points along the vertical stretch do not come out with increasing x, then interp1 will create garbage.

Yoav Shimshi on 13 Apr 2018

You're right. I saw the same thing in my data, and my response was simply to erase those points that did not lie in order. So I guess my answer was a bit misleading.

For my specific problem, the point was that I needed to write the function with x as its argument, so whether I used traveling salesman or not, the jiggling of the points around the singularity was going to be a problem. That is why I also couldn't use interparc to create a parametric curve.

(What I wanted to do was to take the resulting function f(x) and create a sequence from the recursive relation x_n+1 = r*f(x_n), and then to investigate for which values of r chaos emerges)

Sign in to comment.

spline approximation with constraints

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

7 Comments
Show 5 older comments Hide 5 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

spline approximation with constraints

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

7 Comments Show 5 older comments Hide 5 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

7 Comments
Show 5 older comments Hide 5 older comments