[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Job cluster management?



Michael S. Root wrote:

>My question is this:  How do other Condor users manage their job clusters?  
>
We've dealt with the same growing pains. For us, the solution was to 
create a "middleware" layer that includes a database (originally MySQL, 
now Postgresql). The database is populated and updated by a 
schedd-universe job that also acts as a meta-scheduler. This 
meta-scheduler periodically reads .log files, parses them, updates the 
database, checks for dependencies, and launches, holds or releases 
clusters as necessary.

I agree that a "condor_q -cluster" tool would be incredibly useful, even 
if it just shows the number of processes for that cluster remaining in 
the queue. A breakdown by process state would also be useful (i.e., "10% 
running, 30% completed, 60% waiting").

Something else you may find helpful is to wrap shake/prman/maya/whatever 
in a perl script that captures the stdout of the renderer then parses 
it. It's very useful to catch license failures and requeue those frames 
by returning 129, (or 1 for errors, or 0 for happy frames, etc).

-Mark

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>