[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] jobs removed automatically by dags?



Hi again,

things are getting more mysteriously. A set of my jobs (the set of dags
from the previous email) were hindered by a flaky running schedd:

007 (10293330.000.000) 03/17 23:27:45 Shadow exception!
        Failed to connect to schedd!
        1161  -  Run Bytes Sent By Job
        6592404  -  Run Bytes Received By Job
...
007 (10293318.000.000) 03/17 23:27:45 Shadow exception!
        Failed to connect to schedd!
        1161  -  Run Bytes Sent By Job
        6592404  -  Run Bytes Received By Job
...

These I do understand and will probably restartable by the rescue dags,
however a few minutes later, when I'm not near the machines (nor the
other admin who could have the rights) this happened:

009 (10293330.000.000) 03/17 23:41:07 Job was aborted by the user.
        via condor_rm (by user carsten)
...
009 (10293324.000.000) 03/17 23:41:07 Job was aborted by the user.
        via condor_rm (by user carsten)
...
009 (10293318.000.000) 03/17 23:41:07 Job was aborted by the user.
        via condor_rm (by user carsten)
...
009 (10293342.000.000) 03/17 23:41:07 Job was aborted by the user.
        via condor_rm (by user carsten)
...
009 (10293336.000.000) 03/17 23:41:07 Job was aborted by the user.
        via condor_rm (by user carsten)
...
009 (10293348.000.000) 03/17 23:41:07 Job was aborted by the user.
        via condor_rm (by user carsten)
...
009 (10293055.000.000) 03/17 23:41:07 Job was aborted by the user.
        via condor_rm (by user carsten)
...

Will dagman condor_rm jobs on its own?


Puzzled Carsten