[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] noob question about reconfiguring slots



Hello all,
I'm a new HTCondor user/administrator and I'm building an HTCondor system. Right now, I have two physical Centos 6.6 servers with 24 cores and 128GB of RAM each. So, they are configured into 48 slots.


[cyang@rhw1143 ads]$ condor_status
Name        OpSys   ÂArch  State   Activity LoadAv Mem  ActvtyTime

slot10@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:05
slot11@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:06
slot12@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:07
slot13@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:08
slot14@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:09
slot15@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:10
slot16@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:03
slot17@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:04
slot18@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:05
slot19@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:06
slot1@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.690 5375 Â0+00:14:33
slot20@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:07
slot21@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:08
slot22@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:09
slot23@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:10
slot24@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:03
slot2@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:05
slot3@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:06
slot4@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:07
slot5@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:08
slot6@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:09
slot7@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:10
slot8@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:03
slot9@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â0+00:15:04
slot10@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:45
slot11@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:46
slot12@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:47
slot13@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:48
slot14@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:49
slot15@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:50
slot16@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:43
slot17@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:44
slot18@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:45
slot19@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:46
slot1@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â2+01:07:29
slot20@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:47
slot21@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:48
slot22@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:49
slot23@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:50
slot24@xxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.070 5375 Â9+05:51:43
slot2@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â2+01:08:40
slot3@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â2+01:08:58
slot4@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â2+01:08:37
slot5@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â2+01:09:15
slot6@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â2+01:09:23
slot7@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:50
slot8@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:43
slot9@xxxxxxxxxxxx LINUX   ÂX86_64 Unclaimed Idle   Â0.000 5375 Â9+05:51:44
          ÂTotal Owner Claimed Unclaimed Matched Preempting Backfill

    X86_64/LINUX  Â48   0    0    Â48    0     Â0    Â0

       ÂTotal  Â48   0    0    Â48    0     Â0    Â0


However, in running some tests, I'm getting seg faults due to our application running out of memory (I believe). So, I would like to either:

a) configuration dynamic partitioning of resources

orÂ

b) create 12 "double power" slots instead of 24

Eventually, I think I want to do (a), but I figured that it would be good to start with (b) first.

I've been reading up on Machine ClassAds, but I can't seem to find information on how to get condor to read the new ads (condor_advertise maybe?). Also, I'm not quite sure that I'm even writing the ad correctly. Â

So I have what I believe to be a machine ClassAd:

MyType     Â= "Query"
TargetType   Â= "Machine"
NUM_CPUS Â Â Â Â Â Â = 1/12
MEMORY Â Â Â Â Â Â = auto
disk      Â= auto
swap      Â= auto

Does this look right? I would test it out, but I'm not sure what command to run.

Any help is appreciated. Thank you.