I was doing the same analysis comparing Matlab's built in function with a variety of FFT algorithms some of which I wrote. It turns out that Matlab FFT uses FFTW as you mentioned which is compiled C/C++ source code. It is highly optimized for large vectors > 1024. It comes down to optimal/adaptive execution based on array sizes.
It is definitely something under the hood.