[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] JOB kicked off !



I know the reason now. The home directory for the condor user is not writable for other users. chmod +x solved this problem.
 

Yufang Zhang
2006-10-19

发件人: Yufang Zhang
发送时间: 2006-10-19 18:39:45
收件人: Condor-Users Mail List
抄送:
主题: [Condor-users] JOB kicked off !
 
Hi all:
I am working on  Condor 6.7.19. My problem is : job is always kicked off from the node that it is executing on and thus can only execute on the node from which it is submitted.  In the StarterLog of the remote node,I can see : EXEC of user process failed, probably insufficient swap,can angbody in anvance help in this problem?
Thank you for your help!
Best Wishes!
 
Here is some content of StarterLog of the remote node
 
10/19 12:21:56 ********** STARTER starting up ***********
10/19 12:21:56 ** $CondorVersion: 6.7.19 May 10 2006 $
10/19 12:21:56 ** $CondorPlatform: I386-LINUX_RH9 $
10/19 12:21:56 ******************************************
.................
 
10/19 16:00:22 Started user job - PID = 21251
10/19 16:00:22 cmd_fp = 0x8382588
10/19 16:00:22 end
10/19 16:00:22  *FSM* Transitioning to state "SUPERVISE"
10/19 16:00:22  *FSM* Executing state func "supervise_all()" [ GET_NEW_PROC SUSPEND VACATE ALARM DIE CHILD_EXIT PERIODIC_CKPT  ]
10/19 16:00:22  *FSM* Got asynchronous event "CHILD_EXIT"
10/19 16:00:22  *FSM* Executing transition function "reaper"
10/19 16:00:22 Process 21251 exited with status 110
10/19 16:00:22 EXEC of user process failed, probably insufficient swap
10/19 16:00:22  *FSM* Transitioning to state "PROC_EXIT"
10/19 16:00:22  *FSM* Executing state func "proc_exit()" [ DIE  ]
 
 

Yufang Zhang
2006-10-19