[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_off broken?



So now I have some more information:

condor_off command and friends won't work if the hostname is set to condorworker02 on the machine. It has to be set to condorworker02.domain.tld. 

The question: why is that?


2013/11/26 Pek Daniel <pekdaniel@xxxxxxxxx>
OK, the problem a bit more detailed:

I'm using this version:
[root@lxbrb1815 ~]# condor_version
$CondorVersion: 8.1.2 Oct 19 2013 BuildID: 189797 $
$CondorPlatform: x86_64_RedHat5 $

Here's a snippet from condor_status -master output:
[root@condormaster1 ~]# condor_status -master
Name                

condormaster1       
condormaster2       
condorworker02      
lxbrb1815.domain.tld   
...

I have physical nodes and VMs as startd nodes. Physical nodes have more than one core, so more than one jobslots, while VMs have only one core.

Here's a snippet from condor_status -startd | head:
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

condorworker02     LINUX      X86_64 Claimed   Busy      0.000  490  0+00:03:13
slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.060 1991  0+00:11:51
slot2@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:13
...

As you can see, condorworker02 is a VM, while lxbrb1815.domain.tld is a physical node with a lot of cores. And that's the only difference. The config file is exactly the same for both cases, and the condor version as well.

Now, my questions:
- Why I see the slotID@xxxxxxxxxxxxxxxxxxxxxx in case of physical nodes and just the hostname in case of VMs?
- Why can't I query the status of a VM but it's working in case of a physical node:

[root@condormaster1 ~]# condor_status -startd lxbrb1815
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.060 1991  0+00:11:51
slot2@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:13
slot3@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:14
slot4@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:15
slot5@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:16
slot6@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:17
slot7@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:18
slot8@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:11
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX     8     0       0         8       0          0        0

               Total     8     0       0         8       0          0        0
[root@condormaster1 ~]# condor_status -startd lxbrb1815.domain.tld
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.060 1991  0+00:11:51
slot2@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:13
slot3@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:14
slot4@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:15
slot5@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:16
slot6@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:17
slot7@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:18
slot8@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 1991  0+00:12:11
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX     8     0       0         8       0          0        0

               Total     8     0       0         8       0          0        0

[root@condormaster1 ~]# condor_status -startd condorworker02
[root@condormaster1 ~]# condor_status -startd condorworker02.domain.tld
[root@condormaster1 ~]# 

- Why can't I send condor_off command to VMs but it's working fine in case of physical nodes:
[root@condormaster1 ~]# condor_off -startd lxbrb1815
Sent "Kill-Daemon" command for "startd" to master lxbrb1815.domain.tld

[root@condormaster1 ~]# condor_off -startd condorworker02
Can't find address for master condorworker02.domain.tld
Perhaps you need to query another pool.

Thanks,
Daniel



2013/11/26 Zachary Miller <zmiller@xxxxxxxxxxx>
On Tue, Nov 26, 2013 at 11:37:48AM +0100, Pek Daniel wrote:
> I'm trying to "deactivate" some startd machines:
> [root@cm1 ~]# condor_status
> Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime
>
> condorworker01     LINUX      X86_64 Unclaimed Idle      0.000 2006  5+16:16:41
> condorworker03     LINUX      X86_64 Unclaimed Idle      0.000  490  0+00:21:47
> slot1@lxbrl2305    LINUX      X86_64 Unclaimed Idle      1.000 1991  4+18:20:46
> slot2@lxbrl2305    LINUX      X86_64 Unclaimed Idle      1.000 1991  4+18:21:07
> slot3@lxbrl2305    LINUX      X86_64 Unclaimed Idle      1.000 1991  4+18:21:08
> slot4@lxbrl2305    LINUX      X86_64 Unclaimed Idle      1.000 1991  4+18:21:09
> slot5@lxbrl2305    LINUX      X86_64 Unclaimed Idle      1.000 1991  4+18:21:10
> slot6@lxbrl2305    LINUX      X86_64 Unclaimed Idle      0.960 1991 4+18:21:11
> slot7@lxbrl2305    LINUX      X86_64 Unclaimed Idle      0.000 1991  4+18:21:12
> slot8@lxbrl2305    LINUX      X86_64 Unclaimed Idle      0.000 1991  4+18:21:05
>                      Total Owner Claimed Unclaimed Matched Preempting Backfill
>
>         X86_64/LINUX    10     0       0        10       0          0        0
>
>                Total    10     0       0        10       0          0        0
>
> [root@condormaster1 ~]# condor_off -startd -graceful condorworker01
> Can't find address for master condorworker01

Hmmm.  What does "condor_status -master" have to say?


Cheers,
-zach

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/