[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] New to Condor, Need to RUN MPI



Hi Guys
I have been trying the whole day today to figure out how to at least make condor_status show up all the participants.
I just get the frontend

$ condor_status

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle     0.000   990  0+00:45:11
slot2@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   990  0+00:30:05
slot3@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   990  0+00:30:06
slot4@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   990  0+00:30:07
slot5@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   990  0+00:30:08
slot6@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   990  0+00:30:09
slot7@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   990  0+00:25:10
slot8@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   990  0+00:30:03

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX     8     1       0         7       0          0        0

               Total     8     1       0         7       0          0        0


Daemons that I am running on frontend is 
$ ps -el | grep condor
5 S   407  3679     1  0  75   0 -  6702 -      ?        00:00:01 condor_master
4 S   407  3694  3679  0  75   0 -  6933 -      ?        00:00:00 condor_collecto
4 S   407  3864  3679  0  75   0 -  6705 -      ?        00:00:00 condor_schedd
4 S   407  3866  3679  0  78   0 -  6604 -      ?        00:00:06 condor_startd
4 S   407  3867  3679  0  75   0 -  6324 -      ?        00:00:00 condor_negotiat
4 S     0  3874  3864  0  78   0 -  4981 -      ?        00:00:00 condor_procd


And that on a compute node is 

$ps -el | grep condor
5 S   407  2742     1  0  75   0 -  6568 -      ?        00:00:00 condor_master
4 S   407  2792  2742  0  75   0 -  6658 -      ?        00:00:00 condor_schedd
4 S   407  2795  2742  0  75   0 -  6671 -      ?        00:00:05 condor_startd
4 S     0  2799  2792  0  78   0 -  4914 -      ?        00:00:00 condor_procd

I looked up the CollectorLog and found the following entries. Those ips are of the computenodes

2/3 17:28:27 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.251:59011> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:28:31 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.254:52303> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:28:32 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.254:35362> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:28:33 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.254:51246> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:28:34 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.254:40732> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:29:01 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.253:52190> for command 2 (UPDATE_MASTER_AD), access level ADVERTISE_MASTER
2/3 17:29:06 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.253:50986> for command 1 (UPDATE_SCHEDD_AD), access level ADVERTISE_SCHEDD
2/3 17:29:07 NegotiatorAd  : Inserting ** "< comet.cs.bgsu.edu >"
2/3 17:29:11 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.253:42950> for command 1 (UPDATE_SCHEDD_AD), access level ADVERTISE_SCHEDD
2/3 17:29:14 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.253:56511> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:29:14 Got QUERY_STARTD_ADS
2/3 17:29:14 (Sending 0 ads in response to query)
2/3 17:29:15 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.253:53686> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:29:16 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.253:59716> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:29:17 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.253:47375> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:29:19 Got QUERY_STARTD_ADS
2/3 17:29:19 (Sending 0 ads in response to query)
2/3 17:29:34 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.252:42804> for command 2 (UPDATE_MASTER_AD), access level ADVERTISE_MASTER
2/3 17:29:39 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.252:39568> for command 1 (UPDATE_SCHEDD_AD), access level ADVERTISE_SCHEDD
2/3 17:29:47 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.252:44629> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:29:48 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.252:57034> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:29:49 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.252:36419> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:29:50 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.252:50992> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:30:01 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.251:40005> for command 2 (UPDATE_MASTER_AD), access level ADVERTISE_MASTER
2/3 17:30:07 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.251:51484> for command 1 (UPDATE_SCHEDD_AD), access level ADVERTISE_SCHEDD
2/3 17:30:25 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.250:38544> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:30:26 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.250:41501> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:30:27 (Sending 13 ads in response to query)
2/3 17:30:27 Got QUERY_STARTD_PVT_ADS
2/3 17:30:27 (Sending 8 ads in response to query)
2/3 17:30:27 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.250:60420> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
2/3 17:30:28 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.250:54849> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD

This should be pretty easy fix for experts, I have been banging my head all day without any clue.
:-(
Samir