[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Caching large executable on worker nodes

If your individual jobs are running inside a script wrapper, perhaps there's a common space where you can check for the 500 Mbyte common file, e.g. /tmp or /dev/shm .
If it doesn't exist, then fetch it.
Otherwise use it.
You could include in its same a version number and include in the command sequence to the script the version number you want it to use.

Best regards,

Don Krieger, Ph.D.
Department of Neurological Surgery
University of Pittsburgh
(412)648-9654 Office
(412)521-4431 Cell/Text

> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of Dimitri Maziuk
> Sent: Wednesday, August 12, 2015 12:51 PM
> To: htcondor-users@xxxxxxxxxxx
> Subject: Re: [HTCondor-users] Caching large executable on worker nodes
> On 08/12/2015 11:11 AM, Jens Schmaler wrote:
> > Still, I must admit that I do not fully understand the concept yet.
> > Even with a SQUID cache for my cluster, my large executable will still
> > be transferred over the network to the execute machine for each job.
> > The SQUID server might take the load from the submit machine and
> > ideally would have a better network bandwidth, but the overall network
> > traffic remains.
> If you're running the default 1 slot per core setup and have, say, 8 jobs running
> on the same node, you end up with 8 concurrent transfers of the same file to
> the same machine. That'll choke your node's NIC and potentially the switch's
> backplane (not with 500MB files of course) long before that gets to the proxy
> server.
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu