[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor-c "Detected Down Grid Resource"


I'm trying to get Condor-C to work.
I have two one-node pools each running Condor version 6.7.19 under RH9.

These machines do *not* have any Globus software installed, am I right in saying that it is not necessary for grid_resource = condor?

The pools are setup as per  section 5.3.1 of the docs:
submit side entries:

execute side entries:

When I submit a job I get these events in the job log file:

000 (021.000.000) 06/14 14:07:32 Job submitted from host: <10.x.x.21:37339>
020 (021.000.000) 06/14 14:07:55 Detected Down Globus Resource
    RM-Contact: mg10.x.y.z
026 (021.000.000) 06/14 14:07:55 Detected Down Grid Resource
    GridResource: condor mg10x.y.z mg10x.y.z

This is the submit file:
universe = grid
executable = test.sh
output = test.out
error = test.err
log = test.log

grid_resource = condor mg10.x.y.z mg10.x.y.z
+remote_jobuniverse = 5
+remote_requirements = True
+remote_ShouldTransferFiles = "YES"
+remote_WhenToTransferOutput = "ON_EXIT"

The documentation for Condor-c
says that there should be a "remote_pool" entry in the submit file to tell condor where to find the collector that will connect the submit machine schedd with the execute machine schedd, if I understand it correctly.

However the example submit file does not have a remote_pool entry.

I don't get anything in either side's log files to suggest attempted execution
or even communication so I guess the "Detected Down" stuff means that the pools are not finding eachother at all.

I can flock between these pools no problem.

If anyone has gotten Condor-C to work I'd like to hear from them, thanks.
Also if anyone can tell me if I am interpreting the instructions in 5.3.1 correctly I'd appreciate it.


Send instant messages to your online friends http://uk.messenger.yahoo.com