[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [condor-users] Job cluster management?
- Date: Tue, 9 Dec 2003 13:57:23 -0600
- From: Mark Visser <mark@xxxxxxxxxx>
- Subject: Re: [condor-users] Job cluster management?
Michael S. Root wrote:
>My question is this: How do other Condor users manage their job clusters?
We've dealt with the same growing pains. For us, the solution was to
create a "middleware" layer that includes a database (originally MySQL,
now Postgresql). The database is populated and updated by a
schedd-universe job that also acts as a meta-scheduler. This
meta-scheduler periodically reads .log files, parses them, updates the
database, checks for dependencies, and launches, holds or releases
clusters as necessary.
I agree that a "condor_q -cluster" tool would be incredibly useful, even
if it just shows the number of processes for that cluster remaining in
the queue. A breakdown by process state would also be useful (i.e., "10%
running, 30% completed, 60% waiting").
Something else you may find helpful is to wrap shake/prman/maya/whatever
in a perl script that captures the stdout of the renderer then parses
it. It's very useful to catch license failures and requeue those frames
by returning 129, (or 1 for errors, or 0 for happy frames, etc).
Condor Support Information:
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>