I think I've come to an answer that I'm happy with. As I postulated in the very last question of the original post, there is a way that the [z(:).val] assignment can happen more directly. Instead of using num2cell to break x up into individual cells that could then be used to create a comma-separated list (e.g. below),
tmp = num2cell(x); % x is a vector whose elements I want as individual AutoDiff.val scalars
[z(:).val] = tmp{:} % This works but is SLOW for large vectors
I wrote a mex function that directly took the inputs x and dealt them to individual outputs
[self(1:numel(x)).val] = fastDealScalar(x);
where fastDealScalar.c is
#include "mex.h"
/* Output mxArray's should all be 1x1 */
#define ROWS 1
#define COLUMNS 1
#define ELEMENTS 1
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
mwSize index;
mwSize nElems;
double *y, *z;
/* define the number of elements in the input matrix */
y = mxGetPr(prhs[0]);
nElems = mxGetNumberOfElements(prhs[0]);
/* loop through the element of the input and assign them
* individually to the output */
for ( index = 0; index < nElems; index++ ) {
/* set the output pointer for this element */
plhs[index] = mxCreateNumericMatrix(ROWS, COLUMNS, mxDOUBLE_CLASS, mxREAL);
/* create a C pointer to a copy of the output matrix */
z = mxGetPr(plhs[index]);
/* copy a scalar from the input matrix to the output */
*(z) = *(y+index);
}
return;
}
(This was hacked together from matlab example mex files, so forgive the coding style.... I'm not very experienced in C)
So that bypasses the "num2cell" bottleneck and now my code seems to be performing well.