[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Is VM_MAX_NUMBER a redundant macro?



Rob wrote:
> Jaime Frey wrote:
>> VM_MAX_NUMBER is working properly, though its behavior is a little odd.
>> If its value isn't a positive integer, then there's no limit on the number of VMs
>> (other than the number of slots). No error is written to the log in the event of
>> an invalid value. The number of additional VMs that can be started is advertised
>> as 'VM_AvailNum' in the machine ad. In the unlimited case, the value is set to 10000. 
> 
> Jaime,
> 
> I disagree with your explanation in as far as this is documented.
> 
> In the condor_config file it explained like this:
> 
> //snippet ================
> ## In default, the number of possible virtual machines is same as
> ## NUM_CPUS.
> ## Since too many virtual machines can cause the system to be too slow
> ## and lead to unexpected problems, limit the number of running
> ## virtual machines on this machine with
> //end snippet ================
> 
> 
> Also in other condor documentation I can find:
> //===========
>    VM_MAX_NUMBER
>       An integer limit on the number of executing virtual machines.
>       When not defined, the default value is the same NUM_CPUS. 
> //===========
> 
> 
> 
> However, in practice I get following situations:
> 
> 1) If I don't set   VM_MAX_NUMBER  at all in the config file (relying on the default),
>     then:
>    => condor_config_val.exe  reports that VM_MAX_NUMBER macro is not defined
>    => In the machine's adds: VM_AvailNum = 10000
> 
> 2) If I EXPLICITY set in the condor_config file:  VM_MAX_NUMBER = $(NUM_CPUS)
> 
>    => condor_config_val.exe  reports that VM_MAX_NUMBER macro is not defined
>    => In the machine's adds: VM_AvailNum = 10000
> 
> 3) If I set in the condor_config file:  VM_MAX_NUMBER = 2
> 
>     => condor_config_val.exe  reports that VM_MAX_NUMBER macro equals 2
>     => In the machine's adds: VM_AvailNum = 2
> 
> 
> If this is not a bug, then at the least the documentation is wrong!
> 
> The value of VM_MAX_NUMBER never defaults to the number of CPUs in the machine,
> unless this macro is explicitly set so with a hardcoded integer (like my third example).
> 
> I assume this inconsistency with the documentation easily leads to buggy situations,
> as I relied on a default behaviour (= number of CPUs), but instead that defaults to 10000 !!!!
> 
> 
> I think the original idea to let it default to the number of CPUs is very neat, but somehow
> the code does not do this.
> 
> 
> Regards,
> Rob.

If you look in src/condor_submit.V6/submit.cpp you'll see how
ATTR_VM_AVAIL_NUM is used on the job ad. Then have a look at
src/condor_startd.V6/vmuniverse_mgr.cpp to see how VM_MAX_NUMBER relates
to ATTR_VM_AVAIL_NUM. Namely...

   m_vm_max_num = 0;
   tmp = param( "VM_MAX_NUMBER");
   if( tmp ) {
      int vmax = (int)strtol(tmp, (char **)NULL, 10);
      if( vmax > 0 ) {
         m_vm_max_num = vmax;
      }
      free(tmp);
   }

 ...

   // publish the number of still executable Virtual machines
   if( m_vm_max_num > 0 ) {
      int avail_vm_num = m_vm_max_num - numOfRunningVM();
      ad->Assign(ATTR_VM_AVAIL_NUM, avail_vm_num);
   }else {
      // no limit of the number of executable VM
      ad->Assign(ATTR_VM_AVAIL_NUM, VM_AVAIL_UNLIMITED_NUM);
   }

VM_AVAIL_UNLIMITED_NUM is defined in
src/condor_includes/condor_vm_universe_types.h

   #define VM_AVAIL_UNLIMITED_NUM   10000

So, the documentation is a little inaccurate. It should says that when
VM_MAX_NUMBER is undefined or not a number, the number of VMs is not
meaningfully limited.

As for your (2) above, the value of VM_MAX_NUMBER is not being evaluated
to an integer and thus strtol fails and the default is used. This may be
related to an earlier thread that more params should be evaluated as
expressions and in the context of detected system features (# cpus,
amount of memory, time, etc).

Best,


matt