[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor and condor_ganglia issues



I wonder if the path of your interactive shell is unusual.   (are you really running commands as the user roo?)

try running this command

     which gstat

What does it return?

You could try configuring the GANGLIA_GSTAT_COMMAND to have the full path to the gstat command by adding something like this to your condor configuration.

    GANGLIA_GSTAT_COMMAND=/path/to/gstat --all --mpifile --gmond_ip=localhost --gmond_port=8649

-tj


From: Nagaraj Panyam <pn@xxxxxxxxxxx>
Sent: Wednesday, July 28, 2021 8:11 AM
To: John M Knoeller <johnkn@xxxxxxxxxxx>; HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] HTCondor and condor_ganglia issues
 

Hi,


I have the following issues that I need help with.


About my setup:

I have a Ganglia gmetad that handles the regular metrics (cpu, mem, etc) that are sent by gmond's on execute nodes. This part is fine. I now wish to add HTCondor to same gmetad and I need help. This gmetad is on the same host as collector and so on this host I enabled condor_gangliad. (gmetad, collector and condor_gangliad on same host)


A)

GangliadLog has the following set lines repeating. Clip is pasted below. What is the my_popenv error about ?


my_popenv: Failed to exec “gstat, errno=2 (No such file or directory)
Failed to execute “gstat --mpifile --all --gmond_ip=127.0.0.1 --gmond_port=8649”: No such file or directory
Got 329 daemon ads
Heartbeats sent: 0
Starting update...
Heartbeats sent: 0


When I run the gstat command, it shows output as below:


[roo@ce ~]# gstat --all --mpifile --gmond_ip=127.0.0.1 --gmond_port=8649

wn06.my.domain:128
wn05.my.domain:128
wn04.my.domain:128
wn03.my.domain:128
wn02.my.domain:128
wn01.my.domain:128
wn08.my.domain:64
wn07.my.domain:64
localhost.localdomain:8


B)

Is condor_gangliad a routine "data source" for Ganglia's gmetad"? What should be the "data_source" declaration in gmetad.conf?

I have gmond that listens on 8649 for the metrics from the execute nodes. The host running collector itself appears as "localhost" (see above). I tried to understand from this tutorial video at https://research.cs.wisc.edu/htcondor/tutorials/videos/2014/Ganglia.html but I could not read the Ganglia screen shown in the video.


Thanks

Nagaraj




On 7/28/21 3:14 AM, John M Knoeller wrote:

That sounds like something outside of HTCondor is starting one of those condor_gangliad processes.   

What is the parent PID of each?  perhaps we can track back from there...

I don't really know what gstat is, let me ask around and see if any of my colleagues know.

-tj


From: pn <pn@xxxxxxxxxxx>
Sent: Tuesday, July 27, 2021 11:54 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: John M Knoeller <johnkn@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] HTCondor and condor_ganglia issues
 
More about condor_gangliad process:

I stopped condor (systemctl stop). and after that condor_gangliad was
still there. I then killed it. And restarted condor after adding
GANGLIAD to DAEMON_LIST. Sure enough condor_gangliad was one of the
processes. But strangely, less than a second a second condor_gangliad
appeared.

[root@simclu-ce ~]# ps -ea|grep gangliad
2592326 ?        00:00:00 condor_gangliad
2592334 ?        00:00:00 condor_gangliad

Would it be because I have a wrong configuration?

Secondly, Gangliadlog has this error:

07/27/21 21:40:23 my_popenv: Failed to exec “gstat, errno=2 (No such
file or directory)
07/27/21 21:40:23 Failed to execute “gstat --all --mpifile
--gmond_ip=192.168.55.79 --gmond_port=8652”: No such file or directory

What file is it complaining about? I replaced "gstat" with "/bin/gstat"
and the error shows up again "Failed to exec "/bin/gstat, .."

-
Nagaraj




On 2021-07-27 21:15, John M Knoeller wrote:
> I'm not sure why the condor_gangliad would be running if you did not
> add it to your daemon list.   But the error is because you need to put
> GANGLIAD in your daemon list not GANGLIA_D.
>
>  Instructions for how to handle the case where the metad is on a
> different machine than the condor_collector is here
>
>  Monitoring — HTCondor Manual 9.1.0 documentation [1]
>
>  -tj
>
> -------------------------
>
> FROM: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of
> Nagaraj Panyam <pn@xxxxxxxxxxx>
> SENT: Tuesday, July 27, 2021 6:34 AM
> TO: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
> SUBJECT: [HTCondor-users] HTCondor and condor_ganglia issues
>
> Hi,
>
> I am trying to configure HTcondor's ganglia monioring. In that
> context, I see something I do not understand.
>
> Firstly, I see the process condor_gangliad even though it is not in
> the DAEMON_LIST. config_val_dump shows DAEMON_LIST = MASTER COLLECTOR
> NEGOTIATOR SCHEDD). Is this expected?
>
> Secondly, When I specifically add GANGLIA_D to DAEMON_LIST in condor
> config file, the error given below shows up in MasterLog. Where do I
> add the executable path? We  have CONDOR_VERSION = 8.9.13
>
>> GANGLIA_D is in the DAEMON_LIST parameter, but there is no
>> executable path for it defined in the config files!
>> ERROR "Must have the path to GANGLIA_D defined." at line 1606 in
>> file
>>
> /var/lib/condor/execute/slot1/dir_19111/userdir/.tmp9djsO9/BUILD/condor-8.9.13/src/condor_master.V6/masterDaemon.cpp
>
> Thirdly, after resolving above issues, what is the scheme to hookup
> HTCondor's monitoring to existing Ganglia? We will have
> condor_gangliad on same machine as Collector, and Ganglia's metad
> running on a different host.
>
> Thanks
>
> Nagaraj
>
>
>
> Links:
> ------
> [1]
> https://htcondor.readthedocs.io/en/latest/admin-manual/monitoring.html?highlight=gangliad#ganglia
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/