Definitive answer for hyperthreading and the Parallel Computing Toolbox (PCT)?
128 views (last 30 days)
Assume that the computations are amenable to a parfor loop and are computation, vs memory, bound.
Given my budget, I could buy a 4/8 (4 physical core/8 virtual core - hypertheading) i7 CPU or a slower 6/12 xeon or a straight 8 AMD (which tests suggest is much inferior to the i7 in various non-matlab benchmarks). If the hyperthreading, in this context, is the same as physical cores then the i7 would seem to be the way to go.
From what I've read, I can't figure out what the actual story is with respect to what extent PCT uses, or not, the hyperthreading. If anyone has an answer, i.e. a roadmap for determining what algs would and wouldn't benefit, I'd like to see it.
Frankly, I'd love it if mathworks would make a comprehensive benchmark across the various CPUs and various PCT parallelizatons and appropriate demo algs.
Walter Roberson on 25 Jun 2013
Hyperthreading causes two processes to share the same cpu, with fast context switching between the two processes. No additional computation resources are made available in this mode, so if both processes want access to the CPU then the two are going to contend for access, and on long term average probably each get a little under 1/2 of the work done that they would have gotten with dedicated CPUs.
The circumstances under which this kind of sharing is a gain, are the circumstances under which a CPU would otherwise be idle because it is waiting for a resource and the second process has all the resources it needs available. "Waiting for a resource" would often be waiting for I/O to finish. It might include waiting for an interrupt to occur (I do not know if it is implemented to do this). On systems with memory that is shared between a pool of CPUs, or better yet on integrated clusters with shared memory, waiting for a resource could include waiting for memory to become available from a different CPU.
A sample situation in which there would be a benefit would be if one of the threads was an interrupt handler (e.g., DAQ or GBIP input or output) and the other thread is compute bound. When the interrupt becomes ready, there could be a fast switch to the second process, service it briefly, and fast switch back to compute. Yes, you would probably do even better if you devoted a complete CPU to each thread, but the cost would be higher for that. The cost can add up a fair bit for a full context switch to have each of the two serviced by one CPU splitting the load. To invent a figure, you might be able to share a single core 96% compute, 3% I/O, 1% switching waste, whereas without hyperthreading the switching cost could be (say) 14%, leading to 83% compute, 3% I/O, 14% switching waste.
Now, if you are not doing that kind of work on all cores, if most of your cores are compute bound and not waiting for I/O or memory access, having hyperthreading on for those cores is not useful and would slow down progress; if I understand correctly, having hyperthreading turned on with an inappropriate job mix will slow down computations.
It is difficult to create the kind of guide you mention, as the benefits depend on what else is happening. Generally speaking, file operations, imread(), video decompression and decoding, video encoding, serial and DAQ can benefit -- but they might not benefit enough to be worthwhile if you have heavy computations. If you are wanting to do video encoding and decoding then an i3 processor can be a better choice than an i7, as the i3 has that capability built in (H.264 is what comes to mind.) I have forgotten whether the i5 has the encode/decode: my memory is saying that it is available either way.
Edric Ellis on 25 Jun 2013
Edited: Edric Ellis on 13 Oct 2014
By default, MATLAB and Parallel Computing Toolbox consider only real cores, not hyperthreaded cores. You can override this choice in MATLAB using the maxNumCompThreads function. You can override this choice in Parallel Computing Toolbox by modifying the 'local' cluster configuration in the cluster profile manager (you can run up to a maximum of 12 local workers in R2013b and earlier; more in R2014a and later).
Whether hyperthreading provides any benefit depends on the nature of your algorithm. The reason the default is not to consider hyperthreading is that it was found generally not to be beneficial for most numerically intensive workloads.
There isn't currently a single all-encompassing benchmark, but there is a series of benchmarks of Parallel Computing Toolbox functionality here.