[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Job Suspended - Stuck



Hi,

Here's the log file from a job that appears to be suspended, and I cannot resume it.
Short of removing the job and resubmitting it, is there another way to force it to restart or continue?

001 (1321.003.000) 01/15 15:57:23 Job executing on host: <128.114.*.*:9944>
...
006 (1321.003.000) 01/15 15:57:32 Image size of job updated: 24704
    2  -  MemoryUsage of job (MB)
    1572  -  ResidentSetSize of job (KB)
...
010 (1321.003.000) 01/15 15:58:56 Job was suspended.
    Number of processes actually suspended: 2
...
006 (1321.003.000) 01/15 16:06:28 Image size of job updated: 213336
    50  -  MemoryUsage of job (MB)
    51116  -  ResidentSetSize of job (KB)
...
007 (1321.003.000) 01/15 16:06:28 Shadow exception!
    Assertion ERROR on (result)
    0  -  Run Bytes Sent By Job
    0  -  Run Bytes Received By Job
...

And here's what it looks like in condor_q -g -r
 ID      OWNER          . SUBMITTED     RUN_TIME HOST(S)
1321.3   user        1/15 14:59   0+00:45:07 [????????????????]

condor_q -g
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
1321.3   user        1/15 14:59   0+00:45:07 S  0   219.7 stage12_condor_run

--
Andrey Kuznetsov <akuznet1@xxxxxxxx>