[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] jobs don't run in PVM universe



hello ,
i have a pool of 2 machines and i'm tryng to run PVM
examples on condor (master_sum) without success. i
have installed pvm3.4.5 and pvm_condor 
with modifications of sources.
(progs work well in pvm environnement without condor).
so,
when i submit master_sum job logs says always (Error:
can't find resource with capability) 

/////////////////////////////////////////////////////

here StartLog (executing node)

3/22 23:48:50 DaemonCore: Command received via TCP
from host <192.168.0.1:33210>
3/22 23:48:50 DaemonCore: received command 443
(RELEASE_CLAIM), calling handler (command_handler)
3/22 23:48:50 State change: received RELEASE_CLAIM
command
3/22 23:48:50 Changing state and activity:
Claimed/Suspended -> Preempting/Vacating
3/22 23:48:50 Starter pid 3502 exited with status 0
3/22 23:48:50 State change: starter exited
3/22 23:48:50 State change: No preempting claim,
returning to owner
3/22 23:48:50 Changing state and activity:
Preempting/Vacating -> Owner/Idle
3/22 23:48:50 State change: IS_OWNER is false
3/22 23:48:50 Changing state: Owner -> Unclaimed
3/22 23:48:50 DaemonCore: Command received via TCP
from host <192.168.0.1:33214>
3/22 23:48:50 DaemonCore: received command 443
(RELEASE_CLAIM), calling handler (command_handler)
3/22 23:48:50 Error: can't find resource with
capability (<192.168.0.4:1044>#1005804916)
3/22 23:48:50 DaemonCore: Command received via UDP
from host <192.168.0.1:33329>
3/22 23:48:50 DaemonCore: received command 443
(RELEASE_CLAIM), calling handler (command_handler)
3/22 23:48:50 Error: can't find resource with
capability (<192.168.0.4:1044>#1005804916)
3/22 23:48:50 DaemonCore: Command received via UDP
from host <192.168.0.1:33329>
3/22 23:48:50 DaemonCore: received command 443
(RELEASE_CLAIM), calling handler (command_handler)
3/22 23:48:50 Error: can't find resource with
capability (<192.168.0.4:1044>#1005804916)


///////////////////////////////////////////////////////

and SchedLog (manager node):

3/22 22:48:38 In parent, shadow pid = 8962
3/22 22:48:38 shadow_fd = 12
3/22 22:48:38 Sending job 192.0 to shadow pid 8962
3/22 22:48:38 First Line: 192 0 1
3/22 22:48:38 sending <192.168.0.1:32916>
<192.168.0.1:32916>#1477902378 0 soumia
3/22 22:48:38 Existing shadow connected on fd 12
3/22 22:48:38 Sending job 192.0 to shadow pid 8962
3/22 22:48:38 First Line: 192 0 1
3/22 22:48:38 sending <192.168.0.4:1044>
<192.168.0.4:1044>#1005804916 0 mama
3/22 22:49:21 DaemonCore: Command received via TCP
from host <192.168.0.1:33207>
3/22 22:49:21 DaemonCore: received command 478
(ACT_ON_JOBS), calling handler (actOnJobs)
3/22 22:49:21 DaemonCore: Command received via TCP
from host <192.168.0.1:33213>
3/22 22:49:21 DaemonCore: received command 443
(VACATE_SERVICE), calling handler (vacate_service)
3/22 22:49:21 Got VACATE_SERVICE from
<192.168.0.1:33213>
3/22 22:49:21 Sent RELEASE_CLAIM to startd on
<192.168.0.1:32916>
3/22 22:49:21 Match record (<192.168.0.1:32916>, 192,
0) deleted
3/22 22:49:21 Shadow pid 8962 for job 192.0 exited
with status 4
3/22 22:49:21 ERROR: Shadow exited with job exception
code!
3/22 22:49:21 match (<192.168.0.4:1044>#1005804916)
out of jobs (cluster id 192); relinquishing
3/22 22:49:21 Sent RELEASE_CLAIM to startd on
<192.168.0.4:1044>
3/22 22:49:21 Match record (<192.168.0.4:1044>, 192,
-1) deleted
3/22 22:49:21 DaemonCore: Command received via TCP
from host <192.168.0.4:1066>
3/22 22:49:21 DaemonCore: received command 443
(VACATE_SERVICE), calling handler (vacate_service)
3/22 22:49:21 Got VACATE_SERVICE from
<192.168.0.4:1066>
3/22 22:49:28 DaemonCore: Command received via UDP
from host <192.168.0.1:33329>
3/22 22:49:28 DaemonCore: received command 421
(RESCHEDULE), calling handler (reschedule_negotiator)
3/22 22:49:28 Sent ad to central manager for
condor@soumia
3/22 22:49:28 Called reschedule_negotiator()
3/22 22:49:28 Activity on stashed negotiator socket
3/22 22:49:28 Negotiating for owner: condor@soumia
3/22 22:49:28 Checking consistency running and
runnable jobs
3/22 22:49:28 Tables are consistent
3/22 22:49:28 Out of jobs - 1 jobs matched, 0 jobs
idle, flock level = 0
3/22 22:49:30 About to Create_Process(
/usr/local/condor/sbin/condor_shadow.pvm,
condor_shadow.pvm <192.168.0.1:32918>, ... )
3/22 22:49:30 In parent, shadow pid = 9001
3/22 22:49:30 shadow_fd = 12
3/22 22:49:30 Sending job 193.0 to shadow pid 9001
3/22 22:49:30 First Line: 193 0 1
3/22 22:49:30 sending <192.168.0.1:32916>
<192.168.0.1:32916>#6765994899 0 soumia
3/22 22:49:30 Shadow pid 9001 for job 193.0 exited
with status 4
3/22 22:49:30 ERROR: Shadow exited with job exception
code!
3/22 22:49:30 About to Create_Process(
/usr/local/condor/sbin/condor_shadow.pvm,
condor_shadow.pvm <192.168.0.1:32918>, ... )
3/22 22:49:30 In parent, shadow pid = 9007
3/22 22:49:30 shadow_fd = 12
3/22 22:49:30 Sending job 193.0 to shadow pid 9007
3/22 22:49:30 First Line: 193 0 1
3/22 22:49:30 sending <192.168.0.1:32916>
<192.168.0.1:32916>#6765994899 0 soumia
3/22 22:49:30 DaemonCore: Command received via TCP
from host <192.168.0.1:33231>
3/22 22:49:30 DaemonCore: received command 443
(VACATE_SERVICE), calling handler (vacate_service)
3/22 22:49:30 Got VACATE_SERVICE from
<192.168.0.1:33231>
3/22 22:49:30 Sent RELEASE_CLAIM to startd on
<192.168.0.1:32916>
3/22 22:49:30 Match record (<192.168.0.1:32916>, 193,
0) deleted
3/22 22:49:33 Sent ad to central manager for
condor@soumia

///////////////////////////////////////////////
please help me.
thanks in advance.
soumia







	

	
		
___________________________________________________________________________

Nouveau : téléphonez moins cher avec Yahoo! Messenger
! Découvez les tarifs exceptionnels pour appeler la
France et l'international.
Téléchargez sur http://fr.messenger.yahoo.com


	

	
		
___________________________________________________________________________ 
Nouveau : téléphonez moins cher avec Yahoo! Messenger ! Découvez les tarifs exceptionnels pour appeler la France et l'international.
Téléchargez sur http://fr.messenger.yahoo.com