Computer to boost MATLAB operation
Show older comments
Hi all!
I am thinking about buying a new PC for my MATLAB operations, and I was wondering if there are any recomended CPUs or specific architectures that will reduce calculation time. Until now I was buying consumer-level i9 processors, but the computational load of my functions is dramatically increasing and I was wondering if there is a better solution.
PS: I always use the CPU as my functions are often iterative, thus transferring those operations to the GPU is not feasible.
Thanks in advance!
EDIT: I will add some information as suggested by @Jason Ross
My budget is approximately 2k, but if it will improve the performance I am able to save money and reach 5-7k. I would like to know if buying a 5k computer will outperforms 5 1k computer
In my case, I have to use nested loops with matrix operations (calculating adjacency matrices, and graph parameters on that matrices).
I use the Parallel Computing Toolbox, in the last loop
Now, I have 9 user-level computer, that I use simultaneuosly (each one ~1500€) but I feel that I have wasted my money and that a better solution could be reached, that outperforms the 9 PC together.
Also, due to restrictions on my budget (the way I spend it) I couldn't pay for computer time in cloud services.
5 Comments
Jason Ross
on 4 Feb 2021
You will likely get better responses if you can include things like your budget, what types of computation you are doing, if you can use Parallel Computing Toolbox, if you have access to a compute cluster, if you are amenable to paying for compute time on AWS, etc.
Victor R
on 4 Feb 2021
Paul Hoffrichter
on 4 Feb 2021
Currently, on you slow computer system that use consumer-level many-core i9 processors, when you open your task manager when running your application:
- what CPU % do you see? - Should be very high most of the time if good vectorization is used.
- Do you see most of the logical cores operating at near capacity? - most should be operating near peak capacity.
- Do you see much disk activity? - If so, then I/O is contributing to your slow speed.
- Do you see much ethernet activity? - If so, this could be a bottleneck.
- Do you see much GPU activity? - If not, consider changing code to make use of them to see if that helps.
Victor R
on 5 Feb 2021
Walter Roberson
on 5 Feb 2021
If I recall correctly, Google does not operate most of its servers on high end computers. Instead it operates on less expensive lower-value computers but lots of them, having invested a lot of effort into fault-tolerant systems that can automatically yank failing systems offline. Tens of their computers fail every day, but they designed with that in mind. If 3 out of 10 systems fail today, then you still get progress from seven out of the ten; if you had instead gone for two "5 times as good" computers then when one fails, you have lost half your capacity.
Some of the really big computer challenges have been dealt with by operating tens of thousands of home computers... on screensavers.
When you can get lots of computers together, the limits start to be communication, and finding a way to partition the computations into chunks that are at most a couple of days on lower-end computers "after-hours" but meaningful for fast computers. See BOINC and SETI@HOME and various distributed protein folding challenges...
Accepted Answer
More Answers (3)
Walter Roberson
on 5 Feb 2021
2 votes
If all your cores are 95-100% then your code is well vectorized automatically. In such a case, your individual core speed might not be the most important but the aggregate speed might be important.
The AMD Ryzen CPUs have lower per-core speeds than some of the other available systems, but they have excellent aggregate scores, as they can have a lot of cores. They would not be the first choice if your tasks were mostly sitting in single cores, but they can be very nice for tasks that use a lot of linear algebra or straight-forward vectorization.
The AMD enterprise CPUs, E9xx, are designed for enterprise class systems -- longer lasting, better power control, higher standards on dies, more overclocking potential (requiring better cooling.) However, people have been reporting that current versions of MATLAB are not able to use the full power of MKL (Math Kernel Library) equivalents, possibly due to the way that Intel wrote some tests of CPU capabilities into the code. There is a hypothesis that performance could be improved dramatically by setting a particular environment flag, but I have not heard back from anyone who has tried setting the flag.
You can increase the computing power by a factor of 2 if you invest a lot of money. Improving the code can gain speedups of a factor 100 or 1000. This is just the theory, but I have not seen a code yet, which could not be accelerated in any way. Sometimes it is just a question of pre-allocation or processing the data columnwise instead of rowwise.
So before you spend a lot of money today, ask a professional programmer for improving your code. This can help even if you are an experienced programmer also: A higher programming skill allows to see the underlying idea, when you read code. This can cause a blindness for typos and mistakes.
But, of course, sometimes the code is almost optimal already and there are not mathematical shortcuts anymore. Your analysis of the limitting factors of your hardware looks professional. My professor told me: "If you need a faster computer, just wait a year. If you need faster code, do it now."
2 Comments
Victor R
on 6 Feb 2021
Jan
on 6 Feb 2021
Matlab is not efficient for recursive algorithms. All recursive algorithms can be converted to iterative loops, which can be a massive speedup. See:
- https://stackoverflow.com/questions/159590/way-to-go-from-recursion-to-iteration
- https://www.refactoring.com/catalog/replaceRecursionWithIteration.html
- https://www.cs.odu.edu/~zeil/cs361/latest/Public/recursionConversion/index.html
It is worth to identify the bottleneck using the profiler and to post the code hiere in the forum. You do not loose the option to buy stronger hardware.
Paul Hoffrichter
on 5 Feb 2021
0 votes
Where is your time being spent - in the nested loop, or in the last loop that is outside the nested loop?
If, by "last loop", you mean the innermost loop, then that is not as good as using the parallel toolbox at a higher level, preferably at the highest level. If you are using parfor, there is overhead each time you enter that parfor loop, and hitting it hard in the innermost loop can subtract a good deal from your performance.
If you think you can break up your program into multiple chunks that can be distributed to PCs without having a large amount of messaging over the LAN, then that is certainly a reasonable option. The LAN runs slowly, so that is a concern. This is something that you can determine in a simulation before purchasing. The fact that you say you cannot use GPUs makes me think that the distributed PC option may not work out for you.
@Jason RossTo access a local (or cloud) a compute cluster, having the Parallel Computing Toolbox is not enough. It is also required to get the Parallel Server.
4 Comments
Jason Ross
on 5 Feb 2021
@Paul Hoffrichter -- with Parallel Computing Toolbox, you can use the "local" cores on your PC (or a local GPU). No Parallel Server license required. The thought here was if Victor could take advantage of the cores in some more way if his algorithms allowed it to get speedup (and of course if the implicit mulithreading support was not able to help)
If you have an on-site compute cluster you are correct -- up do need the requisite Parallel Server licenses on the cluster.
Since Victor's inital question didn't include budget or what other compute resources were available, I mentioned these as possible options that could be helpful with scaling -- but not everything is amenable to scaling on a compute cluster or on a GPU, and of course we all have
Victor R
on 6 Feb 2021
Walter Roberson
on 6 Feb 2021
GPU use requires NVIDIA GPU.
GPU use is more efficient under Linux instead of Windows: the architecture imposed by Windows requires reserving a hunk of memory and extra transfers.
The efficiency of different models of NVIDIA GPU can vary quite a bit depending whether you are doing single precision or double precision, and exactly which model you are using. Double precision performance can be 1/32 of single precision (common!), 1/24 of single precision (not rare), 1/8 of single precision (specialized), or 1/2 of single precision (if I recall correctly; very specialized and expensive.) You really have to look very carefully at specifications if you are using double precision: an amazingly fast new generation GPU can turn out to be slower for double precision than "just the right model" of two generations before. Sometimes you have to dig a lot to figure out the double precision performance of a particular model.
For various reasons, you would prefer your GPU to have at least roughly 3 times as much memory as your largest input matrix -- with the flip side of that being that you should not count on being able to use input matrices more than roughly 1/3 of available memory.
Synchronization to start or stop a computation is one of the slowest parts, so ask "bigger questions" to amoratize the costs over time. But at the same time, memory transfer is part of that cost, so ideally you would like to transfer in as little as possible and transfer out as little as possible, while having the computation be meaningfully large.
Specific indexing is an expensive operation on GPU, so for loops that work by indexing individual locations are not effective use of resource (and will probably turn out slower than CPU.) Vectorize! Vectorize! Vectorize! Ideally entirely arrays.
Victor R
on 6 Feb 2021
Categories
Find more on Parallel Computing Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!