[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor-c "Detected Down Grid Resource"



Hi,

I'm trying to get Condor-C to work.
I have two one-node pools each running Condor version 6.7.19 under RH9.

These machines do *not* have any Globus software installed, am I right in saying that it is not necessary for grid_resource = condor?

The pools are setup as per  section 5.3.1 of the docs:
submit side entries:
CONDOR_GAHP=$(SBIN)/condor_c-gahp
C_GAHP_LOG=/tmp/CGAHPLog.$(USERNAME)
C_GAHP_WORKER_THREAD_LOG=/tmp/CGAHPWorkerLog.$(USERNAME)

execute side entries:

SEC_DEFAULT_NEGOTIATION = OPTIONAL
SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE
When I submit a job I get these events in the job log file:

...
000 (021.000.000) 06/14 14:07:32 Job submitted from host: <10.x.x.21:37339>
...
020 (021.000.000) 06/14 14:07:55 Detected Down Globus Resource
    RM-Contact: mg10.x.y.z
...
026 (021.000.000) 06/14 14:07:55 Detected Down Grid Resource
    GridResource: condor mg10x.y.z mg10x.y.z
...

This is the submit file:
---------------------------------------------------------------------------------------
universe = grid
executable = test.sh
output = test.out
error = test.err
log = test.log

grid_resource = condor mg10.x.y.z mg10.x.y.z
+remote_jobuniverse = 5
+remote_requirements = True
+remote_ShouldTransferFiles = "YES"
+remote_WhenToTransferOutput = "ON_EXIT"
queue
---------------------------------------------------------------------------------------

The documentation for Condor-c
[http://www.cs.wisc.edu/condor/manual/v6.7/5_3Grid_Universe.html#SECTION00631000000000000000]
says that there should be a "remote_pool" entry in the submit file to tell condor where to find the collector that will connect the submit machine schedd with the execute machine schedd, if I understand it correctly.

However the example submit file does not have a remote_pool entry.

I don't get anything in either side's log files to suggest attempted execution
or even communication so I guess the "Detected Down" stuff means that the pools are not finding eachother at all.

I can flock between these pools no problem.

If anyone has gotten Condor-C to work I'd like to hear from them, thanks.
Also if anyone can tell me if I am interpreting the instructions in 5.3.1 correctly I'd appreciate it.

Cheers,
O.C.

Send instant messages to your online friends http://uk.messenger.yahoo.com