[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] dagman: Possible to run a "PRE" script on same node as the main program will run?



Hi Kent

On Friday 23 July 2010 17:12:22 R. Kent Wenger wrote:
> 
> I've been following this thread, but I didn't have a chance to comment
> before this.  If I understand the problem correctly, it could be solved by
> being able to force two consecutive DAG node jobs to run on the same
> machine, right?  (In other words, you'd have a bunch of pairs of data
> transfer/process jobs, and each pair would be forced to run on the same
> machine.)  You could throttle the data transfer jobs with the category
> throttles in DAGMan, which would allow you to control the load on the
> server.
> 

Yes, that was my initial idea with dagman's pre scripts, however, that was 
before discovering that these run on the submit machine (or at least not on 
the same node as the main job).

> You can take a look at our thoughts on this (gittrac #572, or
> https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=572,4.)

Yes, that looks essentially like what I had in mind. But by just being an 
admin, I don't know which way would be the best (without too much performance 
penalty) and least intrusive to Condor itself.

The comment in the ticket about extra JobAds sound not too bad, would it be 
possible to "join/merge" jobA, jobB, jobC, ...jobX to one virtual job (if the 
user wants it) and let jobA checkout one concurrency_limit unit (or another 
limit the user might set) returning the unit to the pool.

However, I don't know how hard it would be to make this "virtual" job robust 
enough to cooperate with any Condor universe.

Does this make sense?

Cheers

Carsten