This happens to me too every once and a while. I'll dig in to my logs
to see if my errors are like yours. I'm using Windows 2000 R. Kent Wenger wrote: Mike,Hi. I'm seeing a behavior in DAGMan in 6.8.0 that I never saw in 6.6.10 (our previously installed version). Every once in a while, a DAG will get put on hold automatically, for no apparent reason. After some digging in the logs, I see this in the DAG's dagman.out log: 9/6 12:37:41 BAD EVENT: job (4952.0.0) ended, submit count < 1 (0) 9/6 12:37:41 BAD EVENT is warning only 9/6 12:37:41 ERROR "Assertion ERROR on (job->_queuedNodeJobProcs >= 0)" at line 608 in file dag.C And this in the submit machine's SchedLog: 9/6 12:37:41 (pid:14159) (4561.0) Problem parsing user policy for job: The UNKNOWN (never set) OnExitRemove _expression_ '' evaluated to UNDEFINED. Putting job on hold. 9/6 12:37:41 (pid:14159) Job 4561.0 put on hold: The UNKNOWN (never set) OnExitRemove _expression_ '' evaluated to UNDEFINED When the DAG job is released, it seems to continue on just fine. Is this a bug in 6.8.0? I can send complete logs for the DAG and the submit machine to a developer (~400k) if that'd be helpful.I don't think we've seen this before. Could you please send me the complete dagman.out file, and the log file(s) for all of the node jobs? Kent Wenger Condor Team _______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at either https://lists.cs.wisc.edu/archive/condor-users/ http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR -- ------------------------------------------------- http://www.climatecrisis.net/takeaction/whatyoucando/ ---------------------------------------------------- Adam Chrystie adamc@xxxxxxxxxxxxxx Infrastructure / Pipeline Supervisor on "Terra" |