[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] HTCondor-CE: Staging of jobs files failed



Dear all,

I have a strange issue that only affects to LHCb VO in our HTCondor-CEs.Â

>From timeÂto time (10 days, for instance), when LHCb is submitting batches of 300 jobs to the CE, these errors messages appear:

03/08/21 06:37:28 condor_write(): Socket closed when trying to write 88 bytes to <188.185.73.26:32787>, fd is 19
03/08/21 06:37:28 Buf::write(): condor_write() failed
03/08/21 06:37:28 Failed to send GoAhead message.
03/08/21 06:37:28 (cid:3805896) generalJobFilesWorkerThread(): failed to transfer files for job 14832773.230
03/08/21 06:37:28 condor_write(): Socket closed when trying to write 29 bytes to <188.185.73.26:32787>, fd is 19
03/08/21 06:37:28 Buf::write(): condor_write() failed
03/08/21 06:37:28 Scheduler::spoolJobFilesWorkerThread(void *arg, Stream* s) NAP TIME
03/08/21 06:37:29 ERROR - Staging of job files failed!
03/08/21 06:37:29 Job 14832773.0 aborted: Staging of job files failed
03/08/21 06:37:29 Job 14832773.1 aborted: Staging of job files failed
03/08/21 06:37:29 Job 14832773.2 aborted: Staging of job files failed
03/08/21 06:37:29 Job 14832773.3 aborted: Staging of job files failed
[...]

And the 300 jobs are removed from the queue.Â

The behaviour follows theÂnext steps: there are no LHCb jobs queued (all running, all finished, whatever), LHCb submitted 300 jobs in one batch, the Staging errors appear and the issue is solved restarting the condor-ce daemons. After restarting the daemons, the next batch correctly transfers the input files and the jobs are released from the hold state.

I'm really not sure if this is an HTCondor issue,Âan issue from the remote machine or the network. Anyway, the truth is that a restart of the daemons solved it, so, there is something mysterious here.

Any ideas?

Thank you in advance.

Best regards,

Carles

--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
AvÃs - Aviso - Legal Notice: Âhttp://legal.ifae.es