[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to use condor_chirp?



Hi Thomas,

Note that you'll probably find the labels you get in cAdvisor or an alternative won't match directly to HTCondor job ids, e.g. they'll look like:

condor_pool_condor_slot1_3@xxxxxxxxxxxxxxxxxxxxxxx

(i.e. how the cgroups are named), which makes the Grafana plots a little hard to use. One way of getting around this could be to have a cron on each worker node which queries both cAdvisor (using its rest api) and HTCondor, and takes all the stats from cAdvisor but labels them in a more appropriate way, e.g. GlobalJobId, owner, etc. It can then send the appropriately tagged data to InfluxDB, rather than getting cAdvisor to do it directly. I've done this for a different cluster manager and it seems to work well, but haven't tried it yet with HTCondor. However, there may be better ways of getting the same result that I haven't thought of :-)

Regards,
Andrew.

________________________________________
From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Thomas Hartmann [thomas.hartmann@xxxxxxx]
Sent: Tuesday, March 15, 2016 2:06 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] How to use condor_chirp?

Hi Brian,

thanks for the warning.

Our idea is to update a job's ClassAd with information on the cgroups
context its slot is using.
In the end, we would like to to be able to refer a job to the cgroups
statistics, i.e., to monitor the local cgroups and send statistics into
an InfluxDB and in parallel accumulate basic job information from Condor
(frequency, scaling, only accumulation? etc. remains to be seen...).

In the end, it would be nice to be able to plot basic jobs statistics
from InfluxDB with Grafana for at least a subset of jobs (individual
users - large scale/institutional users may be not necessary) for a
range of a few days (regular pruning of data points/measurements in
InfluxDB - scales how well?).

One question for me is, if such condor job statistics are better
accumulated/send from the schedd or better (somehow?) from the startds?
For cgroup statistics I would try to send them directly from the nodes
(cAdvisor or similar approach).

Cheers,
  Thomas

On 2016-03-15 12:23, Brian Bockelman wrote:
> Hi Thomas,
>
> Two things to note:
> 1) "condor_chirp set_job_attr” requires +WantIOProxy=true in the classad.  This updates the ClassAd immediately in the schedd - which can be a scalability concern.
> 2) “condor_chirp set_job_attr_delayed” works by default.  However, it only sends the attribute in the next scheduled ClassAd update (for CPU and memory usage); there’s less scalability concern.  Additionally, attributes set with this command must start with the prefix “Chirp” (case-sensitive).
>
> Can you give some background on what you’re trying to accomplish?
>
> Brian
>
>> On Mar 14, 2016, at 12:44 PM, Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:
>>
>> Hi all,
>>
>> I would like to inject some system information into jobs' ClassAds.
>> As I understand condor_chirp, I cannot inject/manipulate ClassAds on a
>> worker for a running job, but only during submission. (I assume, that
>> ~/.job.ad is where a job keeps its ClassAds [1] -- but can I inject it
>> from outside the job 'properly'?)
>>
>> We get our grid jobs currently via an ARC-CE so I suppose the best place
>> would be there (where?) to enable the communication token by appending
>> +WantIOProxy = TRUE
>> to job submissions scripts, or?
>>
>> I am not sure, where to actually call condor_chirp on the worker?
>> My best guess is so far, to find the template for the job wrappers, i.e.,
>> /var/lib/condor/execute/dir_*/condor_exec.exe
>> and chirp from it, or? Probably starting a separate thread for updating
>> changing ClassAds.
>> Would that be reasonable or is there a better way?
>> Where would I find the wrapper for condor_exec.exe?
>>
>> Cheers and thanks,
>>  Thomas
>>
>>
>>
>> [1]
>>> cat /proc/`ps axf | grep "/bin/bash -l /var/lib/condor/execute/dir" |
>> tail -n1 | cut -d " " -f 2`/environ  | tr '\0' '\n' | grep "_CONDOR_JOB_AD"
>> _CONDOR_JOB_AD=/var/lib/condor/execute/dir_38259/.job.ad
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>