[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Unable to run PVM example



I'm trying to use condor on a pool of linux machines
running fc4. After installing various versions
including 6.8 I was unable to succesfully run the PVM
example. All other examples run smoothly. I also tried
using various PVM versions including 3.4.1 and 3.4.2
but no luck whatsoever. The result of condor_q
-analyze gives:

 ID      OWNER            SUBMITTED     RUN_TIME ST
PRI SIZE CMD               
---
020.000:  Run analysis summary.  Of 3 machines,
      0 are rejected by your job's requirements
      0 reject your job because of their own
requirements
      0 match but are serving users with a better
priority in the pool
      3 match but reject the job for unknown reasons
      0 match but will not currently preempt their
existing job
      0 are available to run your job

1 jobs; 1 idle, 0 running, 0 held

The shadow.log of the submitting machine is included
in this e-mail where the PVM processes seem unable to
start. The PVM example has no problem if used manually
with PVM and without submission to condor. Is there
anyone that has come accross the same problem?

7/26 12:32:49 (?.?) (8189):********** Multi_Shadow
starting up **********
7/26 12:32:49 (?.?) (8189):uid=524, euid=524, gid=524,
egid=524
7/26 12:32:49 (?.?) (8189):My_Filesystem_Domain =
"ltt.mech.ntua.gr"
7/26 12:32:49 (?.?) (8189):My_UID_Domain =
"ltt.mech.ntua.gr"
7/26 12:32:49 (?.?) (8189):Shadow reading via ASCII
7/26 12:32:49 (?.?) (8189):First Line: 20 0 1
7/26 12:32:49 (20.0) (8189):Created class:
7/26 12:32:49 (20.0) (8189):#0: 0 (1, 1) has 0
7/26 12:32:49 (20.0) (8189):New process for proc 0
7/26 12:32:49 (20.0) (8189):AllocProc() returning 0
7/26 12:32:49 (20.0) (8189):Machine from schedd:
<192.168.0.253:32860>
<192.168.0.253:32860>#1153407804#15 0
7/26 12:32:49 (20.0) (8189):Machine Line:
velaki.ltt.mech.ntua.gr 0
7/26 12:32:49 (20.0) (8189):Machines now cur = 1
desire = 1
7/26 12:32:50 (20.0) (8189):Updated class:
7/26 12:32:50 (20.0) (8189):#0: 0 (1, 1) has 1
7/26 12:32:50 (20.0) (8189):Starting pvmd:
/opt/condor-6.8.0/sbin/condor_pvmd -d0x11c
7/26 12:32:50 (20.0) (8189):PVM is pid 8190
7/26 12:32:50 (20.0) (8189):pvmd response:
/tmp/filekW2qLp
7/26 12:32:50 (20.0) (8189):PVMSOCK=/tmp/filekW2qLp
7/26 12:32:50 (20.0) (8189):pvm_fd = 2, mytid = t40001
7/26 12:32:50 (20.0) (8189):Entered
StartWaitingHosts()
7/26 12:32:50 (20.0) (8189):Ok to start waiting hosts
7/26 12:32:50 (20.0) (8189):PVMd message is SM_STHOST
from t80000000
7/26 12:32:50 (20.0) (8189):SM_STHOST: 80000 ""
"192.168.0.253" "$PVM_ROOT/lib/pvmd -s -d0x11c
-nvelaki.ltt.mech.ntua.gr 1 c0a800fd:bb07 4080 2
c0a800fd:0000"
7/26 12:32:50 (20.0) (8189):New process for proc 0
7/26 12:32:50 (20.0) (8189):AllocProc() returning 0
7/26 12:32:50 (20.0) (8189):Shadow: Entering
multi_send_job(velaki.ltt.mech.ntua.gr)
7/26 12:32:50 (20.0) (8189):Requesting Alternate
Starter 1
7/26 12:32:50 (20.0) (8189):Shadow: Request to run a
job was ACCEPTED
7/26 12:32:50 (20.0) (8189):Shadow: RSC_SOCK
connected, fd = 4
7/26 12:32:50 (20.0) (8189):Multi_Shadow: CLIENT_LOG
connected, fd = 5
7/26 12:32:50 (20.0) (8189):in new_timer()
7/26 12:32:50 (20.0) (8189):Timer List
7/26 12:32:50 (20.0) (8189):^^^^^ ^^^^
7/26 12:32:50 (20.0) (8189):id = 0, when = 180
7/26 12:32:50 (20.0) (8189):Shadow: send_pvm_job_info
7/26 12:32:50 (20.0) (8189):send_pvm_job_info: arg =
-s -d0x11c -nvelaki.ltt.mech.ntua.gr 1 c0a800fd:bb07
4080 2 c0a800fd:0000 -f
7/26 12:32:50 (20.0) (8189):On LogSock for host
velaki.ltt.mech.ntua.gr:
-> [pvmd pid8192] 
7/26 12:32:50 (20.0) (8189):On LogSock for host
velaki.ltt.mech.ntua.gr:
-> 07/26 12:32:50 
7/26 12:32:50 (20.0) (8189):On LogSock for host
velaki.ltt.mech.ntua.gr:
-> version 3.4.2
7/26 12:32:50 (20.0) (8189):On LogSock for host
velaki.ltt.mech.ntua.gr:
-> [pvmd pid8192] 
7/26 12:32:50 (20.0) (8189):On LogSock for host
velaki.ltt.mech.ntua.gr:
-> 07/26 12:32:50 
7/26 12:32:50 (20.0) (8189):On LogSock for host
velaki.ltt.mech.ntua.gr:
-> ddpro 2316 tdpro 1318
7/26 12:32:50 (20.0) (8189):On LogSock for host
velaki.ltt.mech.ntua.gr:
-> [pvmd pid8192] 
7/26 12:32:50 (20.0) (8189):On LogSock for host
velaki.ltt.mech.ntua.gr:
-> 07/26 12:32:50 
7/26 12:32:50 (20.0) (8189):On LogSock for host
velaki.ltt.mech.ntua.gr:
-> main() debug mask is 0x11c (tsk,slv,hst,sch)
7/26 12:32:50 (20.0) (8189):In cancel_timer()
7/26 12:32:50 (20.0) (8189):Timer List
7/26 12:32:50 (20.0) (8189):^^^^^ ^^^^
7/26 12:32:50 (20.0) (8189):Received PVM info from
velaki.ltt.mech.ntua.gr
7/26 12:32:50 (20.0) (8189):Adding host
velaki.ltt.mech.ntua.gr to STARTACK msg.
7/26 12:32:50 (20.0) (8189):Num Hosts to pack = 1
7/26 12:32:50 (20.0) (8189):Packing tid t80000 with
reply ddpro<2316> arch<LINUX> ip<c0a800fd:bb09>
mtu<4080> dsig<4229185>
7/26 12:32:50 (20.0) (8189):Sending SM_STHOSTACK to
PVMd
7/26 12:32:50 (20.0) (8189):PVMd message is SM_ADDACK
from t80000000
7/26 12:32:50 (20.0) (8189):Host
#0(velaki.ltt.mech.ntua.gr) has been added to PVM,
pvmd_tid = 80080000
7/26 12:32:50 (20.0) (8189):SendNotification(kind = 3,
tid = t80080000)
7/26 12:32:50 (20.0) (8189):pvm_machines_starting =
0(should be 0)
7/26 12:32:50 (20.0) (8189):StartLocalProcess: =
/home/condor/examples6.8/PVM/master_sum < in_sum >
out_sum >& err_sum
7/26 12:32:50 (20.0) (8189):open_max = 1024
7/26 12:32:50 (20.0) (8189):Local PVM process pid =
8193
7/26 12:32:50 (20.0) (8189):Entered
StartWaitingHosts()
7/26 12:32:50 (20.0) (8189):Ok to start waiting hosts
7/26 12:32:50 (20.0) (8189):deadpid = 8193
7/26 12:32:50 (20.0) (8189):Local process for job 20.0
died with status 0x6e00
7/26 12:32:50 (20.0) (8189):SendNotification(kind = 1,
tid = t0)
7/26 12:32:50 (20.0) (8189):Multi_Shadow: Shutting
down...
7/26 12:32:50 (20.0) (8189):Updated class:
7/26 12:32:50 (20.0) (8189):#0: 0 (1, 1) has 1
7/26 12:32:50 (20.0) (8189):signal_startd(
velaki.ltt.mech.ntua.gr, 443 )
7/26 12:32:50 (20.0) (8189):in new_timer()
7/26 12:32:50 (20.0) (8189):Timer List
7/26 12:32:50 (20.0) (8189):^^^^^ ^^^^
7/26 12:32:50 (20.0) (8189):id = 1, when = 300
7/26 12:32:50 (20.0) (8189):deadpid = 8190
7/26 12:32:50 (20.0) (8189):Lost local pvmd termsig =
9, retcode = 0
7/26 12:32:50 (20.0) (8189):deadpid = -1
7/26 12:32:50 (20.0) (8189):No more dead
processes(errno = 10)
7/26 12:32:50 (20.0) (8189):deadpid = -1
7/26 12:32:50 (20.0) (8189):No more dead
processes(errno = 10)
7/26 12:32:50 (20.0) (8189):Subproc 32767 exited,
termsig = 0, coredump = 0, retcode = 15
7/26 12:32:50 (20.0) (8189):ru_utime = 0.000000
7/26 12:32:50 (20.0) (8189):ru_stime = 0.000000
7/26 12:32:50 (20.0) (8189):Failed to get syscall_code
for proc 0 removing..
7/26 12:32:50 (20.0) (8189):Removing Proc 0(t0) from
Job
7/26 12:32:50 (20.0) (8189):remove starter for host
velaki.ltt.mech.ntua.gr, removing the host too
7/26 12:32:50 (20.0) (8189):RemoveHost: Sending
HostDelete notify on t80080000
7/26 12:32:50 (20.0) (8189):SendNotification(kind = 2,
tid = t80080000)
7/26 12:32:50 (20.0) (8189):signal_startd(
velaki.ltt.mech.ntua.gr, 443 )
7/26 12:32:50 (20.0) (8189):Updated class:
7/26 12:32:50 (20.0) (8189):#0: 0 (1, 1) has 0
7/26 12:32:50 (20.0) (8189):Trying to unlink
/opt/condor-6.8.0/local.velaki/spool/cluster20.proc0.subproc0
7/26 12:32:50 (20.0) (8189):All processes completed,
job should be deleted
7/26 12:32:50 (20.0) (8189):MultiShadow Exiting!!!
7/26 12:32:50 (20.0) (8189):user_time = 2 ticks
7/26 12:32:50 (20.0) (8189):sys_time = 4 ticks
7/26 12:32:50 (20.0) (8189):Entering
multi_update_job_status()
7/26 12:32:50 (20.0) (8189):Shadow: marked job status
"COMPLETED"
7/26 12:32:50 (20.0) (8189):multi_update_job_status()
returns 0
7/26 12:32:50 (20.0) (8189):Shadow: Job exited
normally with status 110
7/26 12:32:50 (20.0) (8189):Notification = "
exited with status 110, and touched 1 machines.
Start-up was unsuccessful on 0 machines."
7/26 12:32:50 (20.0) (8189):********** Shadow Parent
Exiting(100) **********





	

	
		
___________________________________________________________ 
Χρησιμοποιείτε Yahoo!; 
Βαρεθήκατε τα ενοχλητικά μηνύματα (spam); Το Yahoo! Mail 
διαθέτει την καλύτερη δυνατή προστασία κατά των ενοχλητικών 
μηνυμάτων http://login.yahoo.com/config/mail?.intl=gr