Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)

Date: Fri, 5 Mar 2021 08:54:14 -0600
From: Brian Lin <blin@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)

Hi Thomas,

That's quite strange and certainly shouldn't happen! There should be a plain-text /var/lib/condor/spool/job_queue.log: does that file look corrupted at all?

As for the local SYSTEM_PERIODIC_REMOVE, even though it may not be the culprit here, you should move them to your config and append them to the CE's SYSTEM_PERIODIC_REMOVE to avoid similar issues. And if you're on a new enough version of HTCondor-CE, you should be able to remove a few of the clauses:

- Since at least HTCondor-CE 3, held CE jobs are removed after 24 hrs
- HTCondor-CE 4.0.0 disables job retries by default
- HTCondor-CE 5.0.0 (available as a release candidate [1]) will remove jobs that exceed the configured value of "ROUTED_JOB_MAX_TIME"

Brian

[1] https://research.cs.wisc.edu/htcondor/repo/8.9/el7/rc/

On 3/5/21 5:06 AM, Thomas Hartmann wrote:

Hi again,

maybe related(??) - I just noticed, that a restart of the condor unit caused the Schedd to loose all its jobs [1]. Since the restart was more or less instantaneous, I would have expected the Sched to pick up its jobs.

Cheers,
Thomas

[1]
03/05/21 10:55:11 (pid:3997828) WARNING - Cluster 437906 was deleted with proc ads still attached to it. This should only happen during schedd shutdown.

[2]
Mar 05 10:55:11 grid-htcondorce0.desy.de systemd[1]: Stopping Condor Distributed High-Throughput-Computing...
-- Subject: Unit condor.service has begun shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Follow-Ups:
- Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
  - From: Thomas Hartmann
- Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
  - From: Thomas Hartmann
- Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
  - From: Thomas Hartmann

References:
- [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
  - From: Thomas Hartmann
- Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
  - From: Brian Lin
- Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
  - From: Thomas Hartmann
- Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
  - From: Brian Lin
- Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
  - From: Thomas Hartmann
- Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
  - From: Thomas Hartmann

Prev by Date: Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
Next by Date: Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
Previous by thread: Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
Next by thread: Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] CondorCE: job submission to Condor-LRMS fails due to stdout/stderr files missing during staging(?)