[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
- Date: Fri, 5 Mar 2021 08:54:14 -0600
- From: Brian Lin <blin@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
That's quite strange and certainly shouldn't happen! There should be
a plain-text /var/lib/condor/spool/job_queue.log: does that file
look corrupted at all?
As for the local SYSTEM_PERIODIC_REMOVE, even though it may not be
the culprit here, you should move them to your config and append
them to the CE's SYSTEM_PERIODIC_REMOVE to avoid similar issues. And
if you're on a new enough version of HTCondor-CE, you should be able
to remove a few of the clauses:
- Since at least HTCondor-CE 3, held CE jobs are removed after 24
4.0.0 disables job retries by default
5.0.0 (available as a release candidate ) will remove jobs
that exceed the configured value of "ROUTED_JOB_MAX_TIME"
On 3/5/21 5:06 AM, Thomas Hartmann
maybe related(??) - I just noticed, that a restart of the condor
unit caused the Schedd to loose all its jobs . Since the
restart was more or less instantaneous, I would have expected the
Sched to pick up its jobs.
03/05/21 10:55:11 (pid:3997828) WARNING - Cluster 437906 was
deleted with proc ads still attached to it. This should only
happen during schedd shutdown.
Mar 05 10:55:11 grid-htcondorce0.desy.de systemd: Stopping
Condor Distributed High-Throughput-Computing...
-- Subject: Unit condor.service has begun shutting down
-- Defined-By: systemd
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: