[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Monitoring and Administration

On Dec 12, 2013, at 4:47 AM, Renaud Guezennec <renaud@xxxxxxx> wrote:

> Hi,
> To be honest, I'm in charge of making monitoring software but a previous software engineer made a proof of concept in Qt/python, I'm studying his work. He used e-tree as XML parser. The software displays current user's jobs, and some status for each cluster. The python program starts commands as condor_q -xml -attributes "........" and it parses output. It is working but it displays only job from the current user, I'm trying to display all jobs instead.
> We are thinking about switching to Qt/C++, We will probably use the same mechanism, parsing condor_q stdout. But If there is another way, I'm interested to know more. Our goal is to provide Qt application such as Qube! or Deadline. We want to know job status, job progression for our artists, managers want ability to change priority of job or several jobs.
> Priorities are managed by projects in my company. When new projects start, we have to amend a bit the HTCondor configuration.
> If we forget to do so, the jobs of this new project have no priority so they don't start.
> I have some questions:
> -Can we modify the configuration on the fly, by sending command instead of changing configuration file ?  I understood it is possible to ask to the server(master) to read its configuration, am I wrong ?

You can set priorities directly with


(or programmatically via the corresponding python bindings in 8.1.2)

This is not entirely convenient if you use accounting groups.  You can always use condor_config_val to update the config - but that might be a bit clumsy.

> -Is there a way to get job's progression ?

What do you mean?

If the job itself can send out the notification, you can do something like:

condor_chirp set_job_attr_delayed ChirpPercentDone 50

This is enabled by default and highly scalable -- but requires 8.1.2.  Otherwise, you can use:

condor_chirp set_job_attr ChirpPercentDone 50

This form allows jobs to modify any attribute except Owner (equivalent to 'condor_qedit', but from the worker node; possibly a security issue at some sites) and is less scalable -- but available on most modern versions of HTCondor.

> -The Python program asks data regularly, is there another way ? Get notifications ?

Does the monitoring program run on the same host as the scheduler?

You can use the user log reader API (http://research.cs.wisc.edu/htcondor/manual/v8.1.2/6_3HTCondor_User.html) to follow the status of the individual jobs (or the global job log).  I've had success combining that with an inotify watch for efficient notification mechanism.