[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor HoldReason: Unspecified gridmanager error



Dear all,

More details regarding the problem with HTCondor is the fact that the jobs
are killed with signal 9:

failure: "LRMS error: (-1) ExitReason: died on signal 9 (Killed)."

You can find attached the ShadowLog and StarterLog content of a job.

Regards,
Mihai


>
> Dear all,
>
> I have a cluster dedicated to ATLAS experiment. It's a ARC-CE configured
> with HTCondor+Docker.
> It configured to run single core jobs and multi-core jobs.
> For couple of days I see that for nost of the single core jobs I got this
> error message:
>
> The worker was cancelled while the job was starting : Condor HoldReason:
> Unspecified gridmanager error ; Worker canceled by harvester due to held
> too long or not found
>
> Have any one any idea?
>
> Thanks in advance,
> Mihai
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
> a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>


Dr. Mihai Ciubancan
IT Department
National Institute of Physics and Nuclear Engineering "Horia Hulubei"
Str. Reactorului no. 30, P.O. BOX MG-6
077125, Magurele - Bucharest, Romania
http://www.ifin.ro
Work:   +40214042360
Mobile: +40761345687
Fax:    +40214042395

Attachment: condor-ShadowLog-job
Description: Binary data

Attachment: StarterLog.slot3_1-job
Description: Binary data