[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] How to find out what jobs are held?



I recently ran a batch of job, just shy of 4000 in total. When it was done I got this:

condor_q
-- Schedd: jfisher.ingenazure.com : <192.168.1.206:9618?... @ 06/14/17 14:15:14
OWNER Â BATCH_NAME Â Â ÂSUBMITTED Â DONE Â RUN Â ÂIDLE Â HOLD ÂTOTAL JOB_IDS
jfisher   ÂCMD: ngspice    Â6/7 Â22:30    Â1787   Â_      _    Â9     Â1800 261.0 ... 262.4

9 jobs; 0 completed, 0 removed, 0 idle, 0 running, 9 held, 0 suspended

Running condor_release restarted the jobs, but then something crashes and the jobs go back to being held.

then:

condor_q -hold
-- Schedd: jfisher.myserver : <192.168.1.206:9618?... @ 06/14/17 14:05:55
ÂID Â Â ÂOWNER Â Â Â Â ÂHELD_SINCE ÂHOLD_REASON
Â261.0  jfisher     6/14 14:03     ÂError from slot1_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
Â261.1  jfisher     6/14 14:03     ÂError from slot2_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
Â261.2  jfisher     6/14 14:03     ÂError from slot3_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
Â261.3  jfisher     6/14 14:03     ÂError from slot4_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
Â262.0  jfisher     6/14 14:03     ÂError from slot5_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
Â262.1  jfisher     6/14 14:03     ÂError from slot6_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
Â262.2  jfisher     6/14 14:03     ÂError from slot1_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
Â262.3  jfisher     6/14 14:03     ÂError from slot2_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
Â262.4  jfisher     6/14 14:03     ÂError from slot3_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi

Alas the truncation is right where I suspect the information I need is going to be.

Any ideas as to how to find out what those jobs are?


--
Kind regards,

Justin Fisher.