On 7/01/2016 8:13 PM, Xavier Faure-Miller wrote:
I run similar jobs through my HTC cluster and I don't think that it will do what you are after as specified above.Â And for that matter neither will a PBS based HPC cluster either.Â I've also use distributed frameworks such as BOINC as well.Â The problem will be that no matter how efficient a submission and scheduler you have for jobs that short the management and IO is always larger than the run time.
What I have found works really well though is that all my "small" jobs are non-linear with no inter-dependence - think GA population computations or Monte Carlo population samples with tens of thousands or even millions of individuals.Â So what I do is wrap them up into bundles of jobs so that the computational load is much higher than the management overhead.Â After that HTC, PBS etc becomes very efficient.Â There is a tailoring process to balance the IO verses computation.Â And a bit more work to wrap and unwrap as well.
Also consider for my very fast jobs of sub one second the internal IO of the individual process started to out weight the computational over head, so investing in a machine with as many cores as I could afford, with SSDs or even RAM disks made more sense.Â Then I used a simple OpenMP or batch system locally on the machine to run until the IO bus became saturated.Â Finally, there is simply a system overhead required to start processes and reap zombies at the end that for really short jobs with many repetitions takes time and system resources.
Hope that helps,
Description: OpenPGP digital signature