[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Caching large executable on worker nodes



If your individual jobs are running inside a script wrapper, perhaps there's a common space where you can check for the 500 Mbyte common file, e.g. /tmp or /dev/shm .
If it doesn't exist, then fetch it.
Otherwise use it.
You could include in its same a version number and include in the command sequence to the script the version number you want it to use.

Best regards,
Â
Don
Â

Don Krieger, Ph.D.
Department of Neurological Surgery
University of Pittsburgh
(412)648-9654 Office
(412)521-4431 Cell/Text


> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of Dimitri Maziuk
> Sent: Wednesday, August 12, 2015 12:51 PM
> To: htcondor-users@xxxxxxxxxxx
> Subject: Re: [HTCondor-users] Caching large executable on worker nodes
> 
> On 08/12/2015 11:11 AM, Jens Schmaler wrote:
> 
> > Still, I must admit that I do not fully understand the concept yet.
> > Even with a SQUID cache for my cluster, my large executable will still
> > be transferred over the network to the execute machine for each job.
> > The SQUID server might take the load from the submit machine and
> > ideally would have a better network bandwidth, but the overall network
> > traffic remains.
> 
> If you're running the default 1 slot per core setup and have, say, 8 jobs running
> on the same node, you end up with 8 concurrent transfers of the same file to
> the same machine. That'll choke your node's NIC and potentially the switch's
> backplane (not with 500MB files of course) long before that gets to the proxy
> server.
> 
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu