[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Failed to open '.update.ad'



Hello,

I don't think this is related to the .update.ad file. Here are a
couple things that look suspicious. First, the following two lines:

12/07/20 10:36:07 (pid:1802827) Create_Process succeeded, pid=1802841
12/07/20 10:36:09 (pid:1802827) Process exited, pid=1802841, status=1

These indicate that the worker did in fact run your job, but the
executable exited two seconds later with error status 1. If you
haven't already, try setting the error and output files for this job
and look there for any information.

Also, the following two lines seem to indicate you're trying to
transfer a file that doesn't exist:

12/07/20 10:36:09 (pid:1802827) ReliSock::put_file_with_permissions():
Failed to stat file
'/var/lib/condor/execute/dir_1802827/run_mtdna_mito-13-JX-B_L4_1.log':
No such file or directory (errno: 2, si_error: 1)
12/07/20 10:36:09 (pid:1802827) DoUpload: (Condor error code 13,
subcode 2) STARTER at 172.17.23.227 failed to send file(s) to
<172.17.23.141:9618>: error reading from
/var/lib/condor/execute/dir_1802827/run_mtdna_mito-13-JX-B_L4_1.log:
(errno 2) No such file or directory;

Is that .log file something you were expecting to be generated by the job?

Mark

On Sun, Dec 6, 2020 at 9:06 PM åä <kan.wu@xxxxxxxxxxxxx> wrote:
>
> the condor worker failed to run:
>
> the log content in /var/log/condor/StarterLog.slot1 is below:
>
>
> 12/07/20 10:36:07 (pid:1802827) Output file: /var/lib/condor/execute/dir_1802827/_condor_stdout
> 12/07/20 10:36:07 (pid:1802827) Error file: /var/lib/condor/execute/dir_1802827/_condor_stderr
> 12/07/20 10:36:07 (pid:1802827) Renice expr "0" evaluated to 0
> 12/07/20 10:36:07 (pid:1802827) Running job as user gtx
> 12/07/20 10:36:07 (pid:1802827) About to exec /var/lib/condor/execute/dir_1802827/condor_exec.exe
> 12/07/20 10:36:07 (pid:1802827) Create_Process succeeded, pid=1802841
> 12/07/20 10:36:09 (pid:1802827) Process exited, pid=1802841, status=1
> 12/07/20 10:36:09 (pid:1802827) Failed to open '.update.ad' to read update ad: No such file or directory (2).
> 12/07/20 10:36:09 (pid:1802827) ReliSock::put_file_with_permissions(): Failed to stat file '/var/lib/condor/execute/dir_1802827/run_mtdna_mito-13-JX-B_L4_1.log': No such file or directory (errno: 2, si_error: 1)
> 12/07/20 10:36:09 (pid:1802827) DoUpload: (Condor error code 13, subcode 2) STARTER at 172.17.23.227 failed to send file(s) to <172.17.23.141:9618>: error reading from /var/lib/condor/execute/dir_1802827/run_mtdna_mito-13-JX-B_L4_1.log: (errno 2) No such file or directory; SHADOW failed to receive file(s) from <172.17.23.227:32543>
> 12/07/20 10:36:09 (pid:1802827) JICShadow::notifyJobTermination(): Sending mock terminate event.
> 12/07/20 10:36:09 (pid:1802827) JIC::transferOutput() failed, waiting for job lease to expire or for a reconnect attempt
> 12/07/20 10:36:09 (pid:1802827) Returning from CStarter::JobReaper()
> 12/07/20 10:36:09 (pid:1802827) Got SIGQUIT.  Performing fast shutdown.
> 12/07/20 10:36:09 (pid:1802827) ShutdownFast all jobs.
> 12/07/20 10:36:09 (pid:1802827) Failed to open '.update.ad' to read update ad: No such file or directory (2).
> 12/07/20 10:36:09 (pid:1802827) condor_read(): Socket closed abnormally when trying to read 21 bytes from <172.17.23.141:58247>, errno=104 Connection reset by peer
> 12/07/20 10:36:09 (pid:1802827) Lost connection to shadow, waiting 2400 secs for reconnect
> 12/07/20 10:36:09 (pid:1802827) Failed to open '.update.ad' to read update ad: No such file or directory (2).
> 12/07/20 10:36:09 (pid:1802827) Failed to send job exit status to shadow
> 12/07/20 10:36:09 (pid:1802827) All jobs have exited... starter exiting
> 12/07/20 10:36:09 (pid:1802827) **** condor_starter (condor_STARTER) pid 1802827 EXITING WITH STATUS 0
>
> what's the problem maybe?
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/



-- 
Mark Coatsworth
Systems Programmer
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison