[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] starter condor_write() failed



 Hi,

I employed a condor pool with two machine.
The version of condor is 7.6.7 and the OS is fedora14.
When I use condor to run a workflow,it appears wrong as follows.

Startlog
5/15/12 22:17:20 Output file: /home/condor/localcondor/execute/dir_8051/_condor_stdout
05/15/12 22:17:20 Error file: /home/condor/localcondor/execute/dir_8051/_condor_stderr
05/15/12 22:17:20 About to exec /home/condor/localcondor/execute/dir_8051/condor_exec.exe
05/15/12 22:17:20 Create_Process succeeded, pid=8053
05/15/12 22:17:20 Process exited, pid=8053, status=1
05/15/12 22:17:20 ReliSock::put_file_with_permissions(): Failed to stat file '/home/condor/localcondor/execute/dir_8051/diff.000004.000008.fits': No such file or directory (errno: 2, si_error: 1)
05/15/12 22:17:20 DoUpload: (Condo! r error code 13, subcode 2) STARTER at 192.168.1.105 failed to send file(s) to <192.168.1.105:38037>: error reading from /home/condor/localcondor/execute/dir_8051/diff.000004.000008.fits: (errno 2) No such file or directory; SHADOW failed to receive file(s) from <192.168.1.105:55967>
05/15/12 22:17:20 JICShadow::notifyJobTermination(): Sending mock terminate event.
05/15/12 22:17:20 JIC::transferOutput() failed, waiting for job lease to expire or for a reconnect attempt
05/15/12 22:17:20 Returning from CStarter::JobReaper()
05/15/12 22:17:20 Got SIGQUIT.  Performing fast shutdown.
05/15/12 22:17:20 ShutdownFast all jobs.
05/15/12 22:17:20 condor_read() failed: recv() returned -1, errno = 104 Connection reset by peer, reading 5 bytes from <192.168.1.105:36233>.
05/15/12 22:17:20 IO: Failed to read packet header
05/15/12 22:17:20 condor_write(): Socket closed when trying to write 97 bytes to <192.168.1.105:36233>, fd is 605/15/12 22:17:20 Buf::write(): condor_write() failed
05/15/12 22 :17:20 Failed to send job exit status to shadow
05/15/12 22:17:20 JobExit() failed, waiting for job lease to expire or for a reconnect attempt
05/15/12 22:17:40 Got SIGTERM. Performing graceful shutdown.
05/15/12 22:17:40 ShutdownGraceful all jobs.
05/15/12 22:17:40 condor_write(): Socket closed when trying to write 97 bytes to <192.168.1.105:36233>, fd is 6
05/15/12 22:17:40 Buf::write(): condor_write() failed
05/15/12 22:17:40 Failed to send job exit status to shadow
05/15/12 22:17:40 JobExit() failed, waiting for job lease to expire or for a reconnect attempt
05/15/12 22:17:40 **** condor_starter (condor_STARTER) pid 8051 EXITING WITH STATUS 0


Matchlog
<192.168.1.105:55934> preempting none <192.168.1.106:45394> xuwei.shanda.com
05/15/12 22:15:59       Matched 114.0 condor@xxxxxxxxxx <192.168.1.105:55934> preempting none <192.168.1.105:49! 373> yang.shanda.com
05/15/12 22:15:59       Rejected 115.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:15:59       Rejected 108.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:16:19       Rejected 118.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:16:19       Rejected 108.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:17:19       Matched 115.0 condor@xxxxxxxxxx <192.168.1.105:55934> preempting none <192.168.1.106:45394> xuwei.shanda.com
05/15/12 22:17:19      ! ; Matched 116.0 condor@xxxxxxxxxx <192.168.1.105:55934> preempting none <192.168.1.105:49373> yang.shanda.com
05/15/12 22:17:19       Rejected 118.0
condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:17:19       Rejected 108.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found

NegotiatorLog
05/15/12 22:17:19 ---------- Started Negotiation Cycle ----------
05/15/12 22:17:19 Phase 1:  Obtaining ads from collector ...
05/15/12 22:17:19   Getting all public ads ...
05/15/12 22:17:19   Sorting 7 ads ...
05/15/12 22:17:19   Getting startd private ads ...
05/15/12 22:17:19 Got ads: 7 public and 2 private
05/15/12 22:17:19 Public ads include 1 submitter, 2 startd
05/15/12 22:17:19 Phase 2:  Performing accounting ...
05/15/12 22:17:19 Phase 3:  Sorti! ng submitter ads by priority ...
05/15/12 22:17:19 Phase 4.1:  Negotiating with schedds ...
05/15/12 22:17:19   Negotiating with condor@xxxxxxxxxx at <192.168.1.105:55934>
05/15/12 22:17:19 0 seconds so far
05/15/12 22:17:19     Request 00115.00000:
05/15/12 22:17:19       Matched 115.0 condor@xxxxxxxxxx <192.168.1.105:55934> preempting none <192.168.1.106:45394> xuwei.shanda.com
05/15/12 22:17:19       Successfully matched with xuwei.shanda.com
05/15/12 22:17:19     Request 00116.00000:
05/15/12 22:17:19       Matched 116.0 condor@xxxxxxxxxx <192.168.1.105:55934> preempting none <192.168.1.105:49373> yang.shanda.com
05/15/12 22:17:19    &nb! sp;  Successfully matched with yang.shanda.com
05/15/12 22:17: 19     Request 00118.00000:
05/15/12 22:17:19       Rejected 118.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:17:19     Request 00108.00000:
05/15/12 22:17:19       Rejected 108.0 condor@xxxxxxxxxx <192.168.1.105:55934>: no match found
05/15/12 22:17:19     Got NO_MORE_JOBS;  done negotiating
05/15/12 22:17:19  negotiateWithGroup resources used scheddAds length 1
05/15/12 22:17:19 ---------- Finished Negotiation Cycle ----------
05/15/12 22:18:17 Got SIGTERM. Performing graceful shutdown.
05/15/12 22:18:17 **** condor_negotiator (condor_NEGOTIATOR) pid 7249 EXITING WITH STATUS 0


I'm a fresh to condor.
I'll appreciate if you give some answers and advises.
Thank you with your help.

Yang