spline approximation with constraints

Hey everyone. I have a dense set of points x,y that seem to fall on a single curve:
I would like to find a smooth function to mach the data. Specifically, I would like it to fulfill the following criteria:
-It is the closest to the data (say in a least squares sense)
-Its first and second derivatives vanishes only once (hopefully at the maximum and inflection points of the original data)
I have tried doing that using smoothing spline approximation (not sure if this is the correct approach here with such a dense dataset, I any of you have a better idea i would like to hear). So far my best results were obtained by manually picking ~50 points from the data, using:
[xi,yi] = ginput;
and fitting a spline to them, using
splinetools(xi,yi)
And playing with the parameter in the smoothing spline approximation method until the spline matches.
Obviously that's very crude and doesn't satisfy my criteria. How are things like that typically done? Is there a way to put constraints on the number of critical and inflection points?

 Accepted Answer

John D'Errico
John D'Errico on 4 Apr 2018
Edited: John D'Errico on 4 Apr 2018
There are several problems in what you are trying to do.
The obvious one is that splines abhor singularities, and you have one on the left end of that curve. Made up of polynomial segments, you can expect them to fail there.
As well, a standard least squares spline, or a smoothing spline both assume the error is in y only. But you clearly have a cloud that spreads out along the curve in both directions. So any least squares or smoothing spline must perform poorly. So I would not even bother trying to use my own SLM toolbox on this data.
Orthogonal regressions using splines are difficult.
I see that you have supplied the data, so I'll take a look at it. Perhaps I can make a stab at something...

7 Comments

Thank you vary much for the fast response! I also noticed these difficulties with spline fitting. The reason I am using them is because I can't think of another way to fit a smooth function to the data, so if you can think of another way of doing it that would also work.
I have an idea that may do reasonably well.
I would love to hear it.
Thinking a bit about this. I came up with several ideas that might work.
1. Use ginput to choose points along the curve. Connect them using a tool like my interparc (download from the file exchange), or you can use cscvn (from the curvefitting toolbox I think.) You cannot use a standard spline, due to that vertical region. Since you use ginput here, the curve will be as good as is your choice of those points.
2. Use kmeans to choose the points. Connect using interparc/cscvn. The problem here is the points will come out in no special order.
[IDX, C] = kmeans(xy, 50);
plot(x,y,'b.',C(:,1),C(:,2),'rs')
As you can see, these points do reasonably well, with almost no effort made on my part.
plot(C(:,1),C(:,2),'rx')
Again, the problem is they don't lie in any special order. So a spline will fail. My solution would be to connect the dots using a traveling salesman solver. I can find lots of them on the file exchange. As I recall, the tools by Joseph Kirk were pretty good, so here is one:
https://www.mathworks.com/matlabcentral/fileexchange/21196-open-traveling-salesman-problem-genetic-algorithm
So if I try it out, I get this:
USERCONFIG.xy = C;
tspo_ga(USERCONFIG)
res = tspo_ga(USERCONFIG)
res =
struct with fields:
xy: [50×2 double]
dmat: [50×50 double]
popSize: 100
numIter: 10000
showProg: 1
showResult: 1
showWaitbar: 0
optRoute: [2 34 48 12 47 33 10 38 6 21 17 41 4 35 49 22 18 31 5 36 19 24 11 44 42 25 8 37 39 23 30 14 27 9 43 29 16 7 15 28 46 13 32 1 50 20 40 3 45 26]
minDist: 46.257
plot(C(res.optRoute,1),C(res.optRoute,2),'r-',C(:,1),C(:,2),'bs')
That actually did pretty well, and took virtually no thinking on my part to solve, doing so automatically. About 3 lines of code, excluding the plots?
I'm not positive that you need a spline interpolant on this. the 50 points I generated will probably do reasonably well, just using the connect the dots capability of plot.
3. A solution of more complexity might involve an iterative/adaptive approach. Thus, start with two points, one at each end. ginput would suffice here. For example, I picked two endpoints off your plot as:
xycurve = ginput
xycurve =
-0.47235 4.8397
20.611 -0.49563
xyci = interparc(100,xycurve(:,1),xycurve(:,2));
plot(x,y,'b.',xyci(:,1),xyci(:,2),'r-')
Now, I might use my distance2curve utility, also found on the file exchange. Find the point from the data set that lies farthest from that curve.
[~,xydist,t] = distance2curve(xycurve,xy,'linear');
[~,xydist,t] = distance2curve(xycurve,xy,'linear');
[~,ind] = max(xydist)
ind =
700
xy(ind,:)
ans =
-0.22797 20.257
So the farthest point from the curve was #700. Now, we might try to find a point that lies near there, but is centered inside the point cloud. Add that to the curve in the correct sequence, and repeat, automatically adding new points to the curve wherever it fails to model the data sufficiently well. This would take some more thinking on my part to implement, enough so that I'm not sure it is worth it.
To be honest, the second option was so easy to write and use, I'd go with it. If you want a smooth curve, then use either interparc or cscvn to interpolate the set. As far as finding a smooth function goes here, remember that the left end will be a singularity. Any true functional approximation, such as a spline, will be difficult to build.
Sorry for the delay. Anyway, Thanks for the suggestions! The kmeans function you suggested was really useful. It still doesn't give a better result than manually picking the points, but it sure is a lot faster. As for the the issue with the points not coming in the right order, a simple solution would be:
[x,Idx] = unique(x); % or sort(x), unique is just to remove repeated points
y = y(Idx);
It's also important to note, kmeans returns slightly different points each time, so the resulting function would not come out exactly the same every time you build it (which was actually very good for my particular project, because it allowed me to check how small perturbations in the function affected the result).
So far my best result was using kmeans, and then connecting the points in between linearly using interp1:
xp = (min(x):0.001:max(x))';
yp = interp1(xk,yk,xp);
Then I could smooth out the edges created by the interpolation using a moving average filter:
yp = smooth(yp,100);
And finally creating a function with mkpp.
NO. Unique is NOT a solution. Unique just sorts the points, sorting them on increasing x. In fact, interp1 may not even be a good solution.
The problem is, those points on the vertical line may not be in order, even if you sort them on x. There is no assurance that they will be. kmeans will pick points from the cloud. But depending on the noise in that cloud, you should imagine that some of those points will vary back and forth in x. The noise might be large enough that along that line, IF you sort them in x, they will be out of order in y.
So why is that a problem? Interp1 will sort the points on x anyway. So you need not bother using unique, as that is a waste of time, since interp1 is doing an internal sort. But the points won't be in order.
Let me show you an example. I've generated a set of points such as you might find from the m-means procedure. See that the x axis is scaled VERY tightly.
So x varies by only a tiny amount. But it jitters and jogs back and forth somewhat randomly. This is what I would expect to see from your data, when k-means is applied. Here I've allowed MATLAB to autoscale the axes, so we can see what happens if you treat these points improperly.
Now, see what happens when you sort them on x. Again, that is what unique will do. It is what interp1 does anyway.
xint = linspace(0,0.12,1000);
plot(x,y,'b:o',xint,interp1(x,y,xint,'linear'),'r-')
grid on
Look carefully at the differnce between the two lines. The dotted clue line is the one that connects them in the proper order, thus, along there, for increasing y. The red line is what happens when a sort on x intervenes.
Again, all of this is due to the use of k-means on that cloud. It is also why I tried to tell you to use a traveling salesman solver to sort the points properly. The TSP solver will try to sort them in their natural order.
Finally, this is why simple use of interp1 can easily fail, because no matter what, if those points along the vertical stretch do not come out with increasing x, then interp1 will create garbage.
You're right. I saw the same thing in my data, and my response was simply to erase those points that did not lie in order. So I guess my answer was a bit misleading.
For my specific problem, the point was that I needed to write the function with x as its argument, so whether I used traveling salesman or not, the jiggling of the points around the singularity was going to be a problem. That is why I also couldn't use interparc to create a parametric curve.
(What I wanted to do was to take the resulting function f(x) and create a sequence from the recursive relation x_n+1 = r*f(x_n), and then to investigate for which values of r chaos emerges)

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!