# Trouble using pchip to interpolate

39 views (last 30 days)
Anisha Varughese on 1 Aug 2019
Hey, I have been trying to interpolate a data using pchip but it keeps giving the error The data sites should be distinct. I know that it gives this error if there are same values, but it is a huge amount of data. Can someone tell me how to find the points which are causing trouble coz its imposible for me to search for the point manually. Or if there is a way around this.

Joe Vinciguerra on 1 Aug 2019
You'll need to pre-process your data to resolve this. However, the method is up to you and depends on what you're trying to do. It sounds like you understand the root cause is that you have some points that are repeated. Like you mentioned, one tedious option is to manually find and remove the duplicate points.
Below are a couple examples of how you could resolve this programmatically. I started with the example for pchip in the help menu, then added a duplicate value at x=3 to the end.
Option 1 simply removes duplicate values, keeping only one instance. Option 2 computes the average of duplicate values.
% This example is modified from https://www.mathworks.com/help/matlab/ref/pchip.html
x = [-3 -2 -1 0 1 2 3 3];
y = [-1 -1 -1 0 1 1 1 2];
[xUnique, ia, ic] = unique(x); % find location of unique x values
%% Option 1: Remove Duplicates
yRD = y(ia); % create a new vector of only y values occuring at unique x value
%% Option 2: Average Duplicates
yMeans = accumarray(ic, y, [], @mean); % create a new vector of y values by averaging multiple occurances
%%
xq1 = -3:.01:3;
p1 = pchip(xUnique,yRD,xq1);
p2 = pchip(xUnique,yMeans,xq1);
plot(x,y,'ob',xUnique,yRD,'vr',xUnique,yMeans,'^g',xq1,p1,'-',xq1,p2,'-.')
legend('Sample Points','Remove Duplicates','Average Duplicates','pchip1','pchip2','Location','SouthEast')
Keep in mind that there are many other ways to do this, and it all depends on what you want to do with those non-distinct values.

#### 1 Comment

Anisha Varughese on 1 Aug 2019
Thank you so much, it worked.

John D'Errico on 1 Aug 2019
Edited: John D'Errico on 1 Aug 2019
This is a common problem in interpolation using splines. So long ago, I wrote a utility to solve it for you. It has been on the file exchange since the FEX existed.
Using it to solve your problem is now trivial. For VOLUMN vectors oldx and oldy, do this:
[newx,newy] = consolidator(oldx,oldy);
By default, it takes the average value of all values at any replicated x. This is usually the thing you want to do for interpolation, but you can change that operation as you wish. So this explicitly states to use the mean value.
[newx,newy] = consolidator(oldx,oldy,@mean);
If you wanted to find the min or max imum values for reps, you would do one of these calls:
% min
[newx,newy] = consolidator(oldx,oldy,@min);
% max
[newx,newy] = consolidator(oldx,oldy,@max);
% the first value of a set
first = @(V) V(1);
[newx,newy] = consolidator(oldx,oldy,first);
For example,
x = sort(randi(5,[10,1]));
>> y = rand(10,1);
>> [x,y]
ans =
1 0.10665
1 0.9619
1 0.0046342
2 0.77491
3 0.8173
3 0.86869
3 0.084436
5 0.39978
5 0.25987
5 0.80007
[newx,newy] = consolidator(x,y,@mean)
newx =
1
2
3
5
newy =
0.35773
0.77491
0.59014
0.48657
As you can see, it found the replicate x values, then averaged y for each. That set of points can now be used for interpolation, using spline or pchip, or interp1, etc.

#### 1 Comment

Anisha Varughese on 1 Aug 2019
Thanks you so much