GCC compiled MEX file taking more time than the one compiled by Microsoft Visual Studio.

Question

Ubaid Ullah on 4 Jul 2015

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/228412-gcc-compiled-mex-file-taking-more-time-than-the-one-compiled-by-microsoft-visual-studio

Commented: Ubaid Ullah on 8 Jul 2015

opts_files.zip

Hello, I have the following loop:

spmd
    dgtilde = zeros(length(denom),d.nexp2);
    for mm = 1:d.nexp2
        dgtilde(:,mm) = sum(g{d.exp2(mm,1)}.*g{d.exp2(mm,2)}.*weight,2) ...
            - gtilde(:,d.exp2(mm,1)).*gtilde(:,d.exp2(mm,2));
    end 
end

I converted the inner loop to C code as follows:

#include <math.h>
#include <matrix.h>
#include <mex.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void mexFunction(int nlhs, mxArray *plhs[],
        int nrhs, const mxArray *prhs[])
{
    const mwSize *dims;
    const mxArray *cell;
    const mxArray *cellArray1, *cellArray2;
      double *pr1, *pr2;
      double *weight, *gtilde;
      double *exp2;
      double *sum_gammaXmom;
      int mom, cellSize, nnz, mm1, mm2, sgIndex;
      bool issparse1, issparse2;
      mwIndex i, j, k, count, jcell,*ir, *jc;
      mwSize ncol, nrow;
      cell = prhs[0];
      mom = (int)mxGetScalar(prhs[1]);
      weight = mxGetPr(prhs[2]);
      exp2 = mxGetPr(prhs[3]);
      dims = mxGetDimensions(prhs[3]);
      gtilde = mxGetPr(prhs[4]);
      if(mom>dims[0]) mexErrMsgTxt("d.mom variable exceeds g cell array size.");
      jcell = 0;
      cellArray1 = mxGetCell(prhs[0], jcell);
      cellSize = mxGetNumberOfElements(prhs[0]);
      nrow = mxGetM(cellArray1);
      ncol = mxGetN(cellArray1);
      plhs[0] = mxCreateDoubleMatrix(nrow, mom, mxREAL);
      sum_gammaXmom = mxGetPr(plhs[0]);
      count = 0;
      for(j=0;j<(mom*nrow);j++) sum_gammaXmom[j] = 0;
      for (jcell=0; jcell<mom; jcell++) {
          mm1 = (int)exp2[jcell]-1;
          mm2 = (int)exp2[jcell+mom]-1;
          cellArray1 = mxGetCell(prhs[0], mm1);
          cellArray2 = mxGetCell(prhs[0], mm2);
          pr1 = mxGetPr(cellArray1);
          pr2 = mxGetPr(cellArray2);
          for(i=0;i<nrow;i++) {
              sgIndex = i+jcell*nrow;
              for(j=0;j<ncol;j++){
                  sum_gammaXmom[sgIndex] += pr1[i+j*nrow]*pr2[i+j*nrow]*weight[i+j*nrow];
              }
              sum_gammaXmom[sgIndex] = sum_gammaXmom[sgIndex]-gtilde[i+mm1*nrow]*gtilde[i+mm2*nrow];
          }
      }
}

When I compiled the MEX file through Microsoft Visual Studio compiler on Windows machine, it reduces the execution time to half. On the other hand, when I compiled the file to MEX using GCC compiler, the execution time didn't get better at all. I have two questions:

Why is there this difference between the performance of two compilers?
Is there a way to improve C code to perform better?
Should I expect an improvement in the speed if I use a 3D matrix 'g' as an input, instead of a cell array of double matrices 'g'.

g variable is a composite with each lab's data containing a cell array of double matrices.
weight variable is a composite with each lab's data containing a double matrix.
sum_gammaXmom variable is computing dgtilde.

Addendum:

Actually, I have a client who is working on a linux/unix based system with gcc. When I first delivered him C files, he compiled and told me that its only 2x faster than native MATLAB, where I was getting 3x improvement with Microsoft Visual Studio. So I installed GCC on my computer and tested my C functions, and got the same 3x improvement that I was getting with MVS compilers. I asked him to compile with O1, O2, O3 options, but no luck there. I am attaching the mex_C_glnxa64.xml file he is using in his computer and gcc MEXOPTS.bat file that I am using on my local machine. Can you guys tell me if we are using any different parameters that is causing this difference in performance on two machines.

thanks.

3 Comments
Show 1 older commentHide 1 older comment

Ubaid Ullah on 4 Jul 2015

Thanks for your comment dpb. I have checked GCC compiler with O1 to O3 switches, no difference so far.

dpb on 4 Jul 2015

Surprising; gcc is generally considered quite good. Do you have a recent release; what are you running it under/is it a native installation or under an emulation layer or something by any chance?

Sign in to comment.

Sign in to answer this question.

Answer 1

Ivo Houtzager on 4 Jul 2015

2
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/228412-gcc-compiled-mex-file-taking-more-time-than-the-one-compiled-by-microsoft-visual-studio#answer_185025

There is difference in the default floating-point optimization between the compilers.

The floating-point calculations from the GCC compiler follows the strict IEEE compliance by default. The optional -ffast-math flag enables optimizations that can break the strict IEEE compliance. You can try if this option improves the speed at the possible cost of accuracy.

The floating point calculations from the VS compiler does not preserve strict IEEE compliance by default. The default option /fp:precise enables some non-strict optimizations. If you need strict floating point calculations from the VS compiler use the /fp:strict option. For the fastest floating-point calculations that VS compiler can offer use the /fp:fast option.

The VS compiler also enables the use of SSE2 instructions (option /arch:SSE2) by default on x86 platforms. The GCC does not enable the use of SSE2 instructions by default. To enable instructions supported by most common proccesors use the option -mtune=generic.

4 Comments
Show 2 older commentsHide 2 older comments

Ivo Houtzager on 8 Jul 2015

Open in MATLAB Online

The following line shows the optimization options from mexopts.bat.

set OPTIMFLAGS=-O3 -funroll-loops -DNDEBUG

The following line shows the optimization options from mex_C_glxna64.xml.

COPTIMFLAGS="-O -DNDEBUG"

Thus the compiler on the windows platform optimizes more than the linux platform (O3 vs O level). Further, loop unrolling is enabled for the windows compiler. You can set the compile options from the mexopts.bat to the mex_C_glxna64.xml to improve the optimization. You can even try to improve optimization further by adding -ffast-math and/or -mtune=generic options to the line as discussed above.

Ubaid Ullah on 8 Jul 2015

Well my client tried O1 to O3, but he didn't see any improvement. I will ask him to use -ffast-math and -mtune=generic options.

Sign in to comment.

Answer 2

Jan on 4 Jul 2015

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/228412-gcc-compiled-mex-file-taking-more-time-than-the-one-compiled-by-microsoft-visual-studio#answer_185022

Open in MATLAB Online

Why is there this difference between the performance of two compilers?

Compilers translate the C code to machine instructions. There are different possible translations, which lead to the same results but with different runtime. E.g. a compiler can create MMX, SSE, SSE2 or SSE3 instructions. Some will run on modern processors only, others support older processors also. Therefore it is expected that different compilers create programs with different speed.

Try memset instead of a loop to set sum_gammaXmom to zero. Or even better: Omit this zero'ing, because mxCreateDoubleMatrix fills the array with zeros already.

sum_gammaXmom[sgIndex] += pr1[i+j*nrow]*pr2[i+j*nrow]*weight[i+j*nrow];

You could try if storing i+j*nrow in a variable avoid the repeated calculation of the same value. But I hope that smart compilers recognize this. A general problem remains the memory access: It is much cheaper to read and write to and from neighboring elementes in the memory. Is it possible to run the loop over i in the inside, such that [i+j*nrow] accesses contiguos memory elements?

5 Comments
Show 3 older commentsHide 3 older comments

Jan on 5 Jul 2015

Accessing 25 cells costs less than a millisecond. But I do not understand what "with each cell having an 25-element array of double matrices" means.

Ubaid Ullah on 7 Jul 2015

@Jan. Sorry about that. I corrected the sentence.

Sign in to comment.

GCC compiled MEX file taking more time than the one compiled by Microsoft Visual Studio.

3 Comments
Show 1 older commentHide 1 older comment

Answers (2)

4 Comments
Show 2 older commentsHide 2 older comments

5 Comments
Show 3 older commentsHide 3 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

GCC compiled MEX file taking more time than the one compiled by Microsoft Visual Studio.

3 Comments Show 1 older commentHide 1 older comment

Answers (2)

4 Comments Show 2 older commentsHide 2 older comments

5 Comments Show 3 older commentsHide 3 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

4 Comments
Show 2 older commentsHide 2 older comments

5 Comments
Show 3 older commentsHide 3 older comments