[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Nodes are not accepting jobs?



Hello,

I'm trying to figure it out for a while, an Google could not help me... does anyone have any clue?

Well, everything was fine until a power shortage power off all computers from the cluster.

Then I restarted all machines (condor3/4/5). And run the same script that was working before to submit the jobs.

The problem:Â

- condor_q says that machines condor4 and 5 are used, but it does not appear on condor_status.

- those two machines are used for one process most, but it should be fully allocated (same condor_config)

As I said, everything worked perfectly for months until the power shortage issue.

I'm really without any direction to find the issue... does anyone have a clue? Is there something to look forward at any log file?

Thank you for your help!!!!

$ condor_status; condor_q -run
Name       ÂOpSys   Arch ÂState  ÂActivity LoadAv Mem ÂActvtyTime

slot1@condor3  ÂLINUX   X86_64 Claimed ÂBusy   1.000 1991 0+00:12:20
slot2@condor3  ÂLINUX   X86_64 Claimed ÂBusy   1.000 1991 0+00:01:24
slot3@condor3  ÂLINUX   X86_64 Claimed ÂBusy   1.000 1991 0+00:05:25
slot4@condor3  ÂLINUX   X86_64 Claimed ÂBusy   1.000 1991 0+00:11:26
slot5@condor3  ÂLINUX   X86_64 Claimed ÂBusy   1.000 1991 0+00:03:26
slot6@condor3  ÂLINUX   X86_64 Claimed ÂBusy   1.000 1991 0+00:08:28
slot7@condor3  ÂLINUX   X86_64 Claimed ÂBusy   1.000 1991 0+00:11:19
slot8@condor3  ÂLINUX   X86_64 Claimed ÂBusy   1.000 1991 0+00:09:21
slot1@condor4  ÂLINUX   X86_64 Unclaimed Idle   0.000 1991 0+00:00:04
slot2@condor4  ÂLINUX   X86_64 Unclaimed Idle   0.000 1991 0+00:00:05
slot3@condor4  ÂLINUX   X86_64 Unclaimed Idle   0.000 1991 0+00:00:00
slot4@condor4  ÂLINUX   X86_64 Unclaimed Idle   0.000 1991 0+00:00:00
slot5@condor4  ÂLINUX   X86_64 Unclaimed Idle   0.000 1991 0+00:00:08
slot6@condor4  ÂLINUX   X86_64 Unclaimed Idle   0.000 1991 0+02:05:22
slot7@condor4  ÂLINUX   X86_64 Unclaimed Idle   0.000 1991 0+02:03:03
slot8@condor4  ÂLINUX   X86_64 Unclaimed Idle   0.000 1991 0+02:03:24
slot1@condor5  ÂLINUX   X86_64 Unclaimed Idle   0.000 1990 0+00:00:04
slot2@condor5  ÂLINUX   X86_64 Unclaimed Idle   0.000 1990 0+00:00:05
slot3@condor5  ÂLINUX   X86_64 Unclaimed Idle   0.000 1990 0+00:00:00
slot4@condor5  ÂLINUX   X86_64 Unclaimed Idle   0.000 1990 0+00:00:00
slot5@condor5  ÂLINUX   X86_64 Unclaimed Idle   0.000 1990 0+00:00:07
slot6@condor5  ÂLINUX   X86_64 Unclaimed Idle   0.000 1990 0+00:15:09
slot7@condor5  ÂLINUX   X86_64 Unclaimed Idle   0.000 1990 0+00:15:10
slot8@condor5  ÂLINUX   X86_64 Unclaimed Idle   0.000 1990 0+00:15:03
          ÂTotal Owner Claimed Unclaimed Matched Preempting Backfill

    X86_64/LINUX  24  Â0   Â8    16   Â0     0    0

       ÂTotal  24  Â0   Â8    16   Â0     0    0


-- Schedd: dot : <192.168.0.2:24772?...
ÂIDÂ Â Â Â OWNERÂ Â Â Â Â Â SUBMITTEDÂ Â ÂRUN_TIME HOST(S)
786194.0 Âuser      Â3/30 17:54 Â0+00:30:20 slot3@condor5 ÂÂ
786202.0 Âuser      Â3/30 17:54 Â0+00:31:13 slot3@condor4 ÂÂ
786226.0 Âuser      Â3/30 17:54 Â0+00:13:07 slot1@condor3 Â
786227.0 Âuser      Â3/30 17:54 Â0+00:12:20 slot4@condor3 ÂÂ
786228.0 Âuser      Â3/30 17:54 Â0+00:12:10 slot7@condor3 ÂÂ
786229.0 Âuser      Â3/30 17:54 Â0+00:10:19 slot8@condor3 ÂÂ
786230.0 Âuser      Â3/30 17:54 Â0+00:09:20 slot6@condor3   ÂÂ
786231.0 Âuser      Â3/30 17:54 Â0+00:06:20 slot3@condor3   ÂÂ
786232.0 Âuser      Â3/30 17:54 Â0+00:04:19 slot5@condor3   ÂÂ
786233.0 Âuser      Â3/30 17:54 Â0+00:02:20 slot2@condor3   ÂÂ