How can I get splines subject to constraints?

2 views (last 30 days)
Hello. I am trying to use splines in order to get the probability density function from a histogram, something like what can be seen in the picture:
I have to include some constraints, such as:
(Where is the mean of the observed data)
The code that I tryed (with the help of the gpt guy) is completely useless, nonetheless I attach it here:
[counts, edges] = histcounts(data, bins, 'Normalization', 'pdf');
Unrecognized function or variable 'data'.
bin_centers = (edges(1:end-1) + edges(2:end)) / 2;
x_min = min(data)
x_max = max(data)
% mitjana de les dades
x_mean = mean(data)
% Constraints
syms f(x) [1 num_bins] real
assume(f >= 0) % Constraint 2
eq1 = int(f, x_min, x_max) == 1
eq3 = int(x*f, x_min, x_max) == x_mean
eqs = [eq1; eq3];
vars = f;
sol = vpasolve(eqs, vars)
f_opt = double(subs(sol, bin_centers));
One of the problems is the following:
Error using symfun/assume
Assumptions on symbolic functions not supported. Make assumptions on symbolic variables and expressions instead.
Also, the spline function is nowhere to be seen. Can anyone give me a clue?
Thank you very much!
  9 Comments
Torsten
Torsten on 26 Feb 2025
Edited: Torsten on 26 Feb 2025
I gave a link to appropriate code below. Spline approximations are not useful in this case.
Carles
Carles on 26 Feb 2025
@Torsten Sorry, I didn't see your answer before. It is not exactly what I was looking for, but I guess I'll make it work. Thank you very much!

Sign in to comment.

Accepted Answer

Torsten
Torsten on 26 Feb 2025
Edited: Torsten on 26 Feb 2025
This code looks helpful for your purpose:
% Generate 1000 normal random numbers
mu = 0; sigma = 1; nr = 1000;
givenDist = mu + sigma * randn(nr,1);
generatedDist = emprand(givenDist,nr,1);
% %
% % % Plot histogram to check given and generated distribution
[n,xout] = hist(givenDist);
hist(givenDist);
hold on
hist(generatedDist,xout)
% %
% Plot cdf to check given and generated distribution
figure
x = sort(givenDist(:)); % Given distribution
p = 1:length(x);
p = p./length(x);
plot(x,p,'color','r');
hold on
%
xr = sort(generatedDist(:)); % Generated distribution
pr = 1:length(xr);
pr = pr./length(xr);
%
plot(xr,pr,'color','b');
xlabel('x')
ylabel('cdf')
legend('Given Dist.','Generated Dist.')
title('1000 random numbers generated from given normal distribution of data');
function xr = emprand(dist,varargin)
%EMPRAND Generates random numbers from empirical distribution of data.
% This is useful when you do not know the distribution type (i.e. normal or
% uniform), but you have the data and you want to generate random
% numbers form the data. The idea is to first construct cumulative distribution
% function (cdf) from the given data. Then generate uniform random number and
% interpolate from cdf.
%
% USAGE:
% xr = EMPRAND(dist) - one random number
% xr = EMPRAND(dist,m) - m-by-m random numbers
% xr = EMPRAND(dist,m,n) - m-by-n random numbers
%
% INPUT:
% dist - vector of distribution i.e. data values
% m - generates m-by-m matrix of random numbers
% n - generates m-by-n matrix of random numbers
%
% OUTPUT:
% xr - generated random numbers
%
% EXAMPLES:
% % Generate 1000 normal random numbers
% mu = 0; sigma = 1; nr = 1000;
% givenDist = mu + sigma * randn(nr,1);
% generatedDist = emprand(givenDist,nr,1);
% %
% % % Plot histogram to check given and generated distribution
% [n,xout] = hist(givenDist);
% hist(givenDist);
% hold on
% hist(generatedDist,xout)
% %
% Plot cdf to check given and generated distribution
% figure
% x = sort(givenDist(:)); % Given distribution
% p = 1:length(x);
% p = p./length(x);
% plot(x,p,'color','r');
% hold on
%
% xr = sort(generatedDist(:)); % Generated distribution
% pr = 1:length(xr);
% pr = pr./length(xr);
%
% plot(xr,pr,'color','b');
% xlabel('x')
% ylabel('cdf')
% legend('Given Dist.','Generated Dist.')
% title('1000 random numbers generated from given normal distribution of data');
%
% HISTORY:
% version 1.0.0, Release 05-Jul-2005: Initial release
% version 1.1.0, Release 16-Oct-2007: Some bug fixes and improvement of help text
% 1. Can handle NaN values in dist
% 2. Extraplolate for out of range
% 3. Calling function EMPCDF is included within this function
%
% See also:
% Author: Durga Lal Shrestha
% UNESCO-IHE Institute for Water Education, Delft, The Netherlands
% eMail: durgals@hotmail.com
% Website: http://www.hi.ihe.nl/durgalal/index.htm
% Copyright 2004-2007 Durga Lal Shrestha.
% $First created: 05-Jul-2005
% $Revision: 1.1.0 $ $Date: 16-Oct-2007 21:47:47 $
% ***********************************************************************
%% INPUT ARGUMENTS CHECK
error(nargchk(1,3,nargin));
if ~isvector(dist)
error('Invalid data size: input data must be vector')
end
if nargin == 2
m = varargin{1};
n = m;
elseif nargin == 3
m = varargin{1};
n = varargin{2};
else
m = 1;
n = 1;
end
%% COMPUTATION
x = dist(:);
% Remove missing observations indicated by NaN's.
t = ~isnan(x);
x = x(t);
% Compute empirical cumulative distribution function (cdf)
xlen = length(x);
x = sort(x);
p = 1:xlen;
p = p./xlen;
% Generate uniform random number between 0 and 1
ur = rand(m,n);
% Interpolate ur from empirical cdf and extraplolate for out of range
% values.
xr = interp1(p,x,ur,[],'extrap');
end

More Answers (1)

Walter Roberson
Walter Roberson on 25 Feb 2025
You are trying to get vpasolve() to invent functions that satisfy certain purposes, trying various forms of f until it satisfies constraints on integrals.
However, solve() and vpasolve() never invent functions. The only routine that invents functions is dsolve()
syms f(x) [1 num_bins] real
It is not permitted to set assumptions on abstract functions.
assume(f >= 0) % Constraint 2
It is not permitted to set assumptions on abstract functions.
For that matter, if you had had something like
syms x
f(x) = x^3 - x^2 + 1
f(x) = 
then it would be permitted to set assumptions on the function turned into an expression,
assume(f(x) > 0)
assumptions
ans = 
but it would not be permitted to set assumptions on the symbolic function form
assume(f > 0)
Error using symfun/assume (line 12)
Assumptions on symbolic functions not supported. Make assumptions on symbolic variables and expressions instead.
  1 Comment
Walter Roberson
Walter Roberson on 25 Feb 2025
syms f(x) [1 num_bins] real
That attempts to define f(x) as a single function that returns an array of length num_bins and which is real-valued.
eq1 = int(f, x_min, x_max) == 1
eq3 = int(x*f, x_min, x_max) == x_mean
The integral of a function returning a vector of length num_bins is a vector of length num_bins, so eq1 would be a vector of num_bins equations; likewise eq3 would be a vector of num_bins equations.
eqs = [eq1; eq3];
You [;] between two vectors 1 x num_bins, so eqns would be 2 x num_bins of equations.
vars = f;
If that were valid, it would be a function that returned a vector 1 x num_bins
sol = vpasolve(eqs, vars)
and there you would be trying to vpasolve() 2 x num_bins equations in a vector of 1 x num_bins pseudo-variables.
When vpasolve() works, you either need to specify no variables (in which case it solves over all of the symvar() variables), or else you need to specify the explicit list of all variables. (Specifying an explicit list of all variables is useful for the case where you provide initial values, as the initial value list must either be numvars x 1 or else numvars x 2 and you would want to be sure that each initial value is associated with the correct variable.)

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!