[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] schedd tripped over?



Hi all,

it last night Condor seemed to have hit the ground not so nicely:

2/10 06:05:36 (pid:4294) Rebuilt prioritized runnable job list in 0.039s.
2/10 06:05:36 (pid:4294) match (slot1@xxxxxxxxxxxxxxxxx <10.10.9.77:45551> for A@xxxxxxxxxxx) out of jobs; relinquishing
2/10 06:05:36 (pid:4294) Match record (slot1@xxxxxxxxxxxxxxxxx <10.10.9.77:45551> for A@xxxxxxxxxxx, 9595574.-1) deleted
2/10 06:05:48 (pid:4294) Request was NOT accepted for claim slot2@xxxxxxxxxxxxxxxxx <10.10.7.7:52118> for A@xxxxxxxxxxx 9546551.0
2/10 06:05:48 (pid:4294) Sent REQUEST_CLAIM to startd slot2@xxxxxxxxxxxxxxxxx <10.10.7.7:52118> for A@xxxxxxxxxxx
2/10 06:05:48 (pid:4294) Match record (slot2@xxxxxxxxxxxxxxxxx <10.10.7.7:52118> for A@xxxxxxxxxxx, 9546551.0) deleted
2/10 06:06:05 (pid:4294) Shadow pid 9637 for job 9523675.0 exited with status 100
2/10 06:06:05 (pid:4294) Checking consistency running and runnable jobs
2/10 06:06:05 (pid:4294) Tables are consistent
2/10 06:06:05 (pid:4294) Rebuilt prioritized runnable job list in 0.037s.
2/10 06:06:07 (pid:4294) Shadow pid 4721 for job 9588098.0 exited with status 4
2/10 06:06:07 (pid:4294) ERROR: Shadow exited with job exception code!
2/10 06:06:07 (pid:4294) Checking consistency running and runnable jobs
2/10 06:06:07 (pid:4294) Tables are consistent
2/10 06:06:07 (pid:4294) Rebuilt prioritized runnable job list in 0.035s.
2/10 06:06:07 (pid:4294) Shadow pid 9087 for job 9530199.0 exited with status 4
2/10 06:06:07 (pid:4294) ERROR: Shadow exited with job exception code!
2/10 06:06:07 (pid:4294) Checking consistency running and runnable jobs
2/10 06:06:07 (pid:4294) Tables are consistent
2/10 06:06:07 (pid:4294) Rebuilt prioritized runnable job list in 0.035s.  (Expedited rebuild because no match was found)
2/10 06:06:07 (pid:4294) Shadow pid 18963 for job 9535394.0 exited with status 4
2/10 06:06:07 (pid:4294) ERROR: Shadow exited with job exception code!
2/10 06:06:07 (pid:4294) Match for cluster 9535394 has had 5 shadow exceptions, relinquishing.
2/10 06:06:07 (pid:4294) Match record (slot1@xxxxxxxxxxxxxxxxx <10.10.3.36:56383> for A@xxxxxxxxxxx, 9535394.0) deleted
2/10 06:06:07 (pid:4294) Shadow pid 6099 for job 9595353.0 exited with status 4
2/10 06:06:07 (pid:4294) ERROR: Shadow exited with job exception code!
2/10 06:06:07 (pid:4294) Checking consistency running and runnable jobs
2/10 06:06:07 (pid:4294) Tables are consistent
2/10 06:06:07 (pid:4294) Rebuilt prioritized runnable job list in 0.035s.  (Expedited rebuild because no match was found)
2/10 06:06:52 (pid:4294) Shadow pid 13273 for job 9588136.0 exited with status 107
2/10 06:06:52 (pid:4294) Match record (slot1@xxxxxxxxxxxxxxxxx <10.10.2.54:41223> for A@xxxxxxxxxxx, 9588136.0) deleted
2/10 06:06:52 (pid:4294) Null parameter --- match not deleted
2/10 06:06:53 (pid:4294) Shadow pid 20945 for job 9531825.0 exited with status 107
2/10 06:06:53 (pid:4294) Match record (slot2@xxxxxxxxxxxxxxxxx <10.10.9.20:51256> for A@xxxxxxxxxxx, 9531825.0) deleted
2/10 06:06:53 (pid:4294) Null parameter --- match not deleted
2/10 06:06:55 (pid:4294) Shadow pid 1728 for job 9532073.0 exited with status 107
2/10 06:06:55 (pid:4294) Match record (slot4@xxxxxxxxxxxxxxxxx <10.10.16.5:56306> for A@xxxxxxxxxxx, 9532073.0) deleted
2/10 06:06:55 (pid:4294) Null parameter --- match not deleted

After that the Condor pool run dry (only backfills are running now)


Typically, these lines show that something not quite right:
2/10 08:45:20 (pid:4294) Activity on stashed negotiator socket
2/10 08:45:20 (pid:4294) Negotiating for owner: A@xxxxxxxxxxx
2/10 08:45:20 (pid:4294) Out of servers - 0 jobs matched, 4844 jobs idle, 0 jobs rejected


condor_status shows:

                    Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX  6592     0     136        50      25          0     6381

               Total  6592     0     136        50      25          0     6381


Anyone with an idea, what can have caused this?

Cheers

Carsten