Re: [HTCondor-users] memory overprovisionning : restart needed ?

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

On Feb 12, 2015, at 8:58 AM, SCHAER Frederic <frederic.schaer@xxxxxx> wrote:

Hi,

Because jobs that should use 2GB require 2.5 time more memory, I changed our MEMORY setting to be 2.6 times the real memory of the systemsâ
When I query the daemons after issuing a condor_reconfig, I see the change :

# condor_config_val -name wn272 -startd memory
2.6 * quantize( 64364, 1000 )

But when I look at the condor_status, it tells me this :

Name         Cpu Mem LoadAv KbdIdle    State    StateTime Activ ActvtyTime

slot1@wn272. 11   700 0.680 56+01:00:41 Unclaim 0+02:01:14 Idle   0+02:01:14
slot1_10@wn2   1 5200 0.990 56+01:00:41 Claimed 0+02:01:14 Busy   0+02:01:13
slot1_11@wn2   1 5200 0.930 56+01:00:41 Claimed 0+02:35:48 Busy   0+02:35:47
slot1_12@wn2   1 5200 1.000 56+01:00:41 Claimed 0+02:46:01 Busy   0+02:46:00
slot1_13@wn2   1 5200 1.000 56+01:00:41 Claimed 1+01:53:25 Busy   1+01:53:25
slot1_14@wn2   1 5200 1.000 56+01:00:41 Claimed 1+01:52:24 Busy   1+01:52:24
slot1_15@wn2   1 5200 1.000 56+01:00:41 Claimed 1+01:51:24 Busy   1+01:51:24
slot1_16@wn2   1 5200 1.000 56+01:00:41 Claimed 1+01:50:25 Busy   1+01:50:23
slot1_17@wn2   1 5200 1.000 56+01:00:41 Claimed 0+12:56:19 Busy   0+12:56:18
slot1_18@wn2   1 5200 1.000 56+01:00:41 Claimed 0+12:55:19 Busy   0+12:55:18
slot1_19@wn2   1 5200 1.000 56+01:00:41 Claimed 0+12:54:19 Busy   0+12:54:18
slot1_1@wn27   1 5200 1.000 56+01:00:41 Claimed 1+01:57:35 Busy   1+01:57:34
slot1_20@wn2   1 5200 0.690 56+01:00:41 Claimed 0+12:53:19 Busy   0+12:53:17
slot1_22@wn2   1 5200 1.000 56+01:00:41 Claimed 0+12:50:19 Busy   0+12:50:18
slot1_2@wn27   1 5200 0.810 56+01:00:41 Claimed 0+12:57:19 Busy   0+12:57:18
slot1_3@wn27   1 5200 0.940 56+01:00:41 Claimed 0+03:02:14 Busy   0+03:02:13
slot1_4@wn27   1 2100 1.000 56+01:00:41 Claimed 0+07:30:43 Busy   0+07:30:41
slot1_5@wn27   1 2100 1.000 56+01:00:41 Claimed 0+05:30:24 Busy   0+05:30:24
slot1_6@wn27   1 5200 1.000 56+01:00:41 Claimed 0+02:45:40 Busy   0+02:45:39
slot1_7@wn27   1 2100 1.000 56+01:00:41 Claimed 0+05:12:24 Busy   0+05:12:24
slot1_8@wn27   1 2100 1.000 56+01:00:41 Claimed 0+05:28:24 Busy   0+05:28:22
slot1_9@wn27   1 5200 1.000 56+01:00:41 Claimed 0+02:40:28 Busy   0+02:40:27

As you can see, this is the old 1.5 overcommit factor :
# condor_status -state wn272|grep slot|gawk '{s+=$3} END{print s}'
97500

And not the 2.5 one, which should raise the available virtual memory to 160GBâ
I tried restarting condor on one node (service condor restart) and the correct memory resources appeared, but I also lost all running jobs.

My question therefore is : is there a way that condor takes this new setting ?
The slots are configured like this :

NUM_SLOTS = 1
SLOT_TYPE_1               = cpus=100%,mem=100%,auto
NUM_SLOTS_TYPE_1          = 1
SLOT_TYPE_1_PARTITIONABLE = TRUE

Condor version : condor-8.2.6-287355.x86_64

Thanks
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Mailing List Archives

Public Access

Re: [HTCondor-users] memory overprovisionning : restart needed ?