[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] SCHEDD not running right on upgraded CE with Condor 7.6.6



 On 04/04/2012 06:17 AM, Alain Roy wrote:
We have the following line in /etc/sysconfig/condor to point to the system wide
configuration file:
CONDOR_CONFIG="/share/apps/condor/etc/condor_config_7.6.6"
And it's also in the environment when you run condor_q? The daemons and the tools have to read the same configuration files. If they don't, condor_q and the other tools will fail in the way that you're seeing.

That's it.  It's the environment variable issue.  Set CONDOR_CONFIG correctly fix the problem.

We now able to get output from the condor_q and condor_status commands.

Thanks.
Great!

Following problem:

We see that bunch of jobs have scheduled but they are not executing:
I'm confused because I don't see the jobs. There's nothing in the queue, so nothing is executing? Can you explain in a bit more detail?


My mistake. Those are left over jobs before we do the upgrade. Sorry for the confusion.

We now see some jobs are coming in.  Not many but some:

# condor_status

Name OpSys Arch State Activity LoadAv Mem ActvtyTime

slot1@xxxxxxxxxxxx LINUX X86_64 Owner Idle 1.000 982 0+16:25:06 slot2@xxxxxxxxxxxx LINUX X86_64 Owner Idle 0.300 982 0+16:25:07 slot3@xxxxxxxxxxxx LINUX X86_64 Owner Idle 0.000 982 0+16:25:08 slot4@xxxxxxxxxxxx LINUX X86_64 Owner Idle 0.000 982 0+16:25:09 slot5@xxxxxxxxxxxx LINUX X86_64 Owner Idle 0.000 982 0+16:25:10 slot6@xxxxxxxxxxxx LINUX X86_64 Owner Idle 0.000 982 0+16:25:11 slot7@xxxxxxxxxxxx LINUX X86_64 Owner Idle 0.000 982 0+16:25:12 slot8@xxxxxxxxxxxx LINUX X86_64 Owner Idle 0.000 982 0+16:25:05 slot1@compute-10-1 LINUX X86_64 Claimed Busy 1.130 1963 0+07:42:48 slot2@compute-10-1 LINUX X86_64 Claimed Busy 1.100 1963 0+07:44:49 slot3@compute-10-1 LINUX X86_64 Claimed Busy 1.100 1963 0+07:49:51 slot4@compute-10-1 LINUX X86_64 Claimed Busy 1.090 1963 0+07:49:52 slot1@compute-10-1 LINUX X86_64 Claimed Busy 1.110 1963 0+07:49:09 slot2@compute-10-1 LINUX X86_64 Claimed Busy 1.080 1963 0+07:49:10 slot3@compute-10-1 LINUX X86_64 Claimed Busy 1.080 1963 0+07:49:10 slot4@compute-10-1 LINUX X86_64 Claimed Busy 1.090 1963 0+07:49:11 slot1@compute-10-1 LINUX X86_64 Claimed Busy 1.170 1963 0+07:49:55 slot2@compute-10-1 LINUX X86_64 Claimed Busy 1.140 1963 0+07:49:56 slot3@compute-10-1 LINUX X86_64 Claimed Busy 1.150 1963 0+07:49:57 slot4@compute-10-1 LINUX X86_64 Claimed Busy 1.150 1963 0+07:49:58 slot1@compute-10-1 LINUX X86_64 Claimed Busy 1.280 1963 0+07:49:06 slot2@compute-10-1 LINUX X86_64 Claimed Busy 1.240 1963 0+07:49:07 slot3@compute-10-1 LINUX X86_64 Claimed Busy 1.240 1963 0+07:49:08 slot4@compute-10-1 LINUX X86_64 Claimed Busy 1.230 1963 0+07:49:09 slot1@compute-10-1 LINUX X86_64 Claimed Busy 1.240 1963 0+07:49:17 slot2@compute-10-1 LINUX X86_64 Claimed Busy 1.230 1963 0+07:49:18 slot3@compute-10-1 LINUX X86_64 Claimed Busy 1.220 1963 0+07:49:19 slot4@compute-10-1 LINUX X86_64 Claimed Busy 1.230 1963 0+07:49:20 slot1@compute-10-1 LINUX X86_64 Claimed Busy 1.000 1963 0+07:49:32 slot2@compute-10-1 LINUX X86_64 Claimed Busy 1.010 1963 0+07:49:33 slot3@compute-10-1 LINUX X86_64 Claimed Busy 1.000 1963 0+07:49:34 slot4@compute-10-1 LINUX X86_64 Claimed Busy 1.030 1963 0+07:49:35 slot1@compute-10-5 LINUX X86_64 Claimed Busy 1.200 1963 0+08:05:46 slot2@compute-10-5 LINUX X86_64 Claimed Busy 1.210 1963 0+08:05:47 slot3@compute-10-5 LINUX X86_64 Claimed Busy 1.210 1963 0+08:05:48 slot4@compute-10-5 LINUX X86_64 Claimed Busy 1.200 1963 0+08:05:49 slot1@compute-10-9 LINUX X86_64 Claimed Busy 1.190 1963 0+07:52:22 slot2@compute-10-9 LINUX X86_64 Claimed Busy 1.190 1963 0+07:52:23 slot3@compute-10-9 LINUX X86_64 Claimed Busy 1.210 1963 0+07:52:24 slot4@compute-10-9 LINUX X86_64 Claimed Busy 1.170 1963 0+07:54:26 Total Owner Claimed Unclaimed Matched Preempting Backfill

X86_64/LINUX 40 8 32 0 0 0 0

Total 40 8 32 0 0 0 0


Another question if I may. How do I determine those jobs are handle by Condor correctly??

Thanks.

Steven.


-alain

# cd /wntmp/home
# ls
alice        uscms0179  uscms0604  uscms1029  uscms1454  uscms1879  uscms2304
cdf          uscms0180  uscms0605  uscms1030  uscms1455  uscms1880
    .
    .
    .


# condor_status

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle     1.000   982  0+04:05:04
slot2@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle     0.760   982  0+04:05:05
slot3@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle     0.000   982  0+04:05:06
slot4@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle     0.000   982  0+04:05:07
slot5@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle     0.000   982  0+04:05:08
slot6@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle     0.000   982  0+04:05:09
slot7@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle     0.000   982  0+04:05:10
slot8@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle     0.000   982  0+04:05:03
slot1@compute-10-5 LINUX      X86_64 Unclaimed Idle     0.000  1963  6+07:14:53
slot2@compute-10-5 LINUX      X86_64 Unclaimed Idle     0.000  1963  6+07:15:19
slot3@compute-10-5 LINUX      X86_64 Unclaimed Idle     0.000  1963  6+07:15:20
slot4@compute-10-5 LINUX      X86_64 Unclaimed Idle     0.000  1963  6+07:15:21
slot10@compute-20- LINUX      X86_64 Unclaimed Idle     0.000  4024  0+07:45:07
slot11@compute-20- LINUX      X86_64 Unclaimed Idle     0.000  4024  0+07:45:08
slot12@compute-20- LINUX      X86_64 Unclaimed Idle     0.000  4024  0+07:45:09
slot1@compute-20-3 LINUX      X86_64 Unclaimed Idle     0.420  4024  0+07:44:43
slot2@compute-20-3 LINUX      X86_64 Unclaimed Idle     0.000  4024  0+07:45:07
slot3@compute-20-3 LINUX      X86_64 Unclaimed Idle     0.000  4024  0+07:45:08
slot4@compute-20-3 LINUX      X86_64 Unclaimed Idle     0.000  4024  0+07:45:09
slot5@compute-20-3 LINUX      X86_64 Unclaimed Idle     0.000  4024  0+07:45:10
slot6@compute-20-3 LINUX      X86_64 Unclaimed Idle     0.000  4024  0+07:45:11
slot7@compute-20-3 LINUX      X86_64 Unclaimed Idle     0.000  4024  0+07:45:12
slot8@compute-20-3 LINUX      X86_64 Unclaimed Idle     0.000  4024  0+07:45:05
slot9@compute-20-3 LINUX      X86_64 Unclaimed Idle     0.000  4024  0+07:45:06
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX    24     8       0        16       0          0        0

               Total    24     8       0        16       0          0        0


# condor_q


-- Submitter: cithep252.ultralight.org :<10.3.255.253:48116>  : cithep252.ultralight.org
ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD

0 jobs; 0 idle, 0 running, 0 held


Thanks.

Steven.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature