[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] "can't find resource with capability" error!



Hi!

 I hava a simple java universe job that prints out "Hello World!" when
executed from the command line.

The submit file is thus:

###########################
#example 1
# Execute a single Java class
#
############################

  universe       = java
  executable     = hello.class
  arguments      = hello
  output         = hello.output
  error          = hello.error
  log            = hello.log
  queue


Observing the following when I submit this job.

The log when I force i.e. use condor_reschedule on my central manager
where schedd is running as "condor" is:

12:31:19am> palomar:/tmp $ 6/15 00:31:45 DaemonCore: Command received via
TCP from host <129.79.246.125:56028>
6/15 00:31:45 DaemonCore: received command 421 (RESCHEDULE), calling
handler (reschedule_negotiator)
6/15 00:31:45 Sent ad to central manager for vdukle@xxxxxxxxxxxxxxxxxxx
6/15 00:31:45 Called reschedule_negotiator()
6/15 00:31:45 Activity on stashed negotiator socket
6/15 00:31:45 Negotiating for owner: vdukle@xxxxxxxxxxxxxxxxxxx
6/15 00:31:45 Checking consistency running and runnable jobs
6/15 00:31:45 Tables are consistent
6/15 00:31:45 Out of jobs - 1 jobs matched, 0 jobs idle, flock level = 0
6/15 00:31:48 match (<129.79.246.123:53162>#2291185139) out of jobs
(cluster id 368); relinquishing
6/15 00:31:48 Sent RELEASE_CLAIM to startd on <129.79.246.123:53162>
6/15 00:31:48 Match record (<129.79.246.123:53162>, 368, 0) deleted
6/15 00:31:48 DC_AUTHENTICATE: attempt to open invalid session
palomar:15231:1087272551:20, failing.
6/15 00:31:50 Sent ad to central manager for vdukle@xxxxxxxxxxxxxxxxxxx


Now, 129.79.246.123's (is this the execute machine?)  StartLog says:

[0:34] brick:/u/condor/hosts/brick/log % tail -f StartLog
6/15 00:31:48 Changing state and activity: Claimed/Idle ->
Preempting/Vacating
6/15 00:31:48 State change: No preempting claim, returning to owner
6/15 00:31:48 Changing state and activity: Preempting/Vacating ->
Owner/Idle
6/15 00:31:48 State change: IS_OWNER is false
6/15 00:31:48 Changing state: Owner -> Unclaimed
6/15 00:31:48 DaemonCore: Command received via UDP from host
<129.79.246.145:33347>
6/15 00:31:48 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler (command_handler)
6/15 00:31:48 Error: can't find resource with capability
(<129.79.246.123:53162>#2291185139)
6/15 00:31:48 DaemonCore: Command received via UDP from host
<129.79.246.145:33347>
6/15 00:31:48 DaemonCore: received command 60014 (DC_INVALIDATE_KEY),
calling handler (handle_invalidate_key())

However, 129.79.246.123 is a java capable host (in fact all machines in
the pool are) confirmed by running "condor_status -java".  Also, the
"owner" attribute has my correct username.  So am not sure why it gives
the above error("Error: can't find resource with capability") then? What
am I missing here?

The job just sits in the idle state and never runs.

Any pointers would be appreciated.  Thanks!

Regards,
--Vinayak