[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Inconsistencies: hold versus abort



On Mon, 8 Dec 2014, Brian Candler wrote:

Yes, that would be correct. Let me retest this.

[on OSX personal condor]

Brians-MacBook-Air:tmp $ cat t1.log
1418057606 INTERNAL *** DAGMAN_STARTED 8.0 ***
1418057619 t1 SUBMIT_FAILURE - - - 1
1418057624 t1 SUBMIT_FAILURE - - - 1
1418057629 t1 SUBMIT_FAILURE - - - 1
1418057634 t1 SUBMIT_FAILURE - - - 1
1418057645 t1 SUBMIT_FAILURE - - - 1
1418057662 t1 SUBMIT_FAILURE - - - 1
1418057662 INTERNAL *** DAGMAN_FINISHED 1 ***
Brians-MacBook-Air:tmp $ cat t1.status
[
  Type = "DagStatus";
  DagFiles = {
    "t1.dag"
  };
  Timestamp = 1418057619; /* "Mon Dec  8 16:53:39 2014" */
  DagStatus = 3; /* "STATUS_SUBMITTED ()" */
  NodesTotal = 1;
  NodesDone = 0;
  NodesPre = 0;
  NodesQueued = 0;
  NodesPost = 0;
  NodesReady = 1;
  NodesUnready = 0;
  NodesFailed = 0;
  JobProcsHeld = 0;
  JobProcsIdle = 0;
] [
  Type = "NodeStatus";
  Node = "t1";
  NodeStatus = 1; /* "STATUS_READY" */
  StatusDetails = "";
  RetryCount = 0;
  JobProcsQueued = 0;
  JobProcsHeld = 0;
] [
  Type = "StatusEnd";
  EndTime = 1418057619; /* "Mon Dec  8 16:53:39 2014" */
  NextUpdate = 1418057619; /* "Mon Dec  8 16:53:39 2014" */
]

Hmm weird -- looks like a bug. I've created a ticket for it, and someone will look into it soon. I'm wondering if something about the node having submit failures (as opposed to getting submitted but then failing) is triggering the bug.

Kent Wenger
CHTC Team