[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Caching large executable on worker nodes



Hi Don,

thank you very much for this hint which gave me "SQUID" as an additional
keyword. According to this talk, it seems that this will actually become
part of Condor 8.4.0:

http://research.cs.wisc.edu/htcondor/HTCondorWeek2015/presentations/VuosaloC_FileTransCachingProxy.pdf


Still, I must admit that I do not fully understand the concept yet. Even
with a SQUID cache for my cluster, my large executable will still be
transferred over the network to the execute machine for each job. The
SQUID server might take the load from the submit machine and ideally
would have a better network bandwidth, but the overall network traffic
remains. I do not believe that there will be a slim SQUID proxy on each
execute machine which caches everything locally, right?

Cheers,

Jens




Am 11.08.15 um 22:15 schrieb Krieger, Donald N.:
> It's my understanding that many OSG HTcondor installations include the
> SQUID caching mechanism.
> 
> This works for files which are fetched via http.
> 
> We have divided the files which go to each job into 1 chunk which is the
> same for all jobs (â20 Mbytes), a 2nd chunk which is the same for blocks
> of â3000 jobs (â100 Mbytes), and a 3rd chunk which is different for each
> job (â50 Kbytes).  Spot checks show that the caching mechanism âhitsâ
> 80-95% of the time.
> 
>  
> 
> Best regards,
> 
>  
> 
> Don
> 
>  
> 
> Don Krieger, Ph.D.
> 
> Department of Neurological Surgery
> 
> University of Pittsburgh
> 
> (412)648-9654 Office
> 
> (412)521-4431 Cell/Text
> 
>  
> 
>  
> 
>> -----Original Message-----
> 
>> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> 
>> Of Jens Schmaler
> 
>> Sent: Tuesday, August 11, 2015 2:39 PM
> 
>> To: HTCondor-Users Mail List
> 
>> Subject: [HTCondor-users] Caching large executable on worker nodes
> 
>>
> 
>> Hi all,
> 
>>
> 
>> we are currently in a situation where transferring the executable to
> the execute
> 
>> machine for each job starts to get a limiting factor. Our case is the
> following:
> 
>>
> 
>> - large executable (500MB), which is the same for a large number of
> jobs within
> 
>> one cluster (jobs only differ in input arguments)
> 
>>
> 
>> - few execute machines, i.e. each execute machine will run many such jobs
> 
>> (transferring the executable each time although this would not be
> 
>> necessary)
> 
>>
> 
>> - we are using the file transfer mechanism, but I believe the problem
> would be
> 
>> similar with a shared file system
> 
>>
> 
>> - we would like to keep the current job structure for various reasons,
> i.e. we
> 
>> would rather not combine multiple jobs into one longer-running one (I can
> 
>> provide the arguments for this if needed)
> 
>>
> 
>>
> 
>> My goal would be to reduce the time and network traffic for
> transferring the
> 
>> executable thousands of times.
> 
>>
> 
>> A very natural idea would be to cache the executable on each execute
> machine,
> 
>> hoping that we can make use of it in case we get another job of the same
> 
>> cluster. I probably would be able to hack something that will do the
> trick,
> 
>> although doing it properly might take quite some effort (when and how
> to clean
> 
>> up the cache?, ...)
> 
>>
> 
>> On the other hand, this seems like a very common problem, so I was
> wondering
> 
>> whether Condor offers some built-in magic to cope with this?
> 
>> Maybe I am missing something obvious?
> 
>>
> 
>> Are there any recommended best practices for my case?
> 
>>
> 
>> Thank you very much in advance,
> 
>>
> 
>> Jens
> 
>>
> 
>> _______________________________________________
> 
>> HTCondor-users mailing list
> 
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> <mailto:htcondor-users-request@xxxxxxxxxxx> with a
> 
>> subject: Unsubscribe
> 
>> You can also unsubscribe by visiting
> 
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
>>
> 
>> The archives can be found at:
> 
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>