[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Caching large executable on worker nodes



It's my understanding that many OSG HTcondor installations include the SQUID caching mechanism.

This works for files which are fetched via http.

We have divided the files which go to each job into 1 chunk which is the same for all jobs (≈20 Mbytes), a 2nd chunk which is the same for blocks of ≈3000 jobs (≈100 Mbytes), and a 3rd chunk which is different for each job (≈50 Kbytes).  Spot checks show that the caching mechanism “hits” 80-95% of the time.

 

Best regards,

 

Don

 

Don Krieger, Ph.D.

Department of Neurological Surgery

University of Pittsburgh

(412)648-9654 Office

(412)521-4431 Cell/Text

 

 

> -----Original Message-----

> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf

> Of Jens Schmaler

> Sent: Tuesday, August 11, 2015 2:39 PM

> To: HTCondor-Users Mail List

> Subject: [HTCondor-users] Caching large executable on worker nodes

>

> Hi all,

>

> we are currently in a situation where transferring the executable to the execute

> machine for each job starts to get a limiting factor. Our case is the following:

>

> - large executable (500MB), which is the same for a large number of jobs within

> one cluster (jobs only differ in input arguments)

>

> - few execute machines, i.e. each execute machine will run many such jobs

> (transferring the executable each time although this would not be

> necessary)

>

> - we are using the file transfer mechanism, but I believe the problem would be

> similar with a shared file system

>

> - we would like to keep the current job structure for various reasons, i.e. we

> would rather not combine multiple jobs into one longer-running one (I can

> provide the arguments for this if needed)

>

>

> My goal would be to reduce the time and network traffic for transferring the

> executable thousands of times.

>

> A very natural idea would be to cache the executable on each execute machine,

> hoping that we can make use of it in case we get another job of the same

> cluster. I probably would be able to hack something that will do the trick,

> although doing it properly might take quite some effort (when and how to clean

> up the cache?, ...)

>

> On the other hand, this seems like a very common problem, so I was wondering

> whether Condor offers some built-in magic to cope with this?

> Maybe I am missing something obvious?

>

> Are there any recommended best practices for my case?

>

> Thank you very much in advance,

>

> Jens

>

> _______________________________________________

> HTCondor-users mailing list

> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a

> subject: Unsubscribe

> You can also unsubscribe by visiting

> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

>

> The archives can be found at:

> https://lists.cs.wisc.edu/archive/htcondor-users/