[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] maximum of 9 dynamically allocated slots on Windows?



Hello,

We've run into a problem using partitionable slots on large multicore machines running Windows. We have several machines with 32 or more processor threads, but at most 9 of these cores get utilized for running jobs.

NUM_SLOTS = 1

NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1 = cpus=100% , ram=100%, swap=90%, disk=90%
SLOT_TYPE_1_PARTITIONABLE = true

Poring over logs reveals that jobs are trying to start on slots 10 through 32, but get killed immediately due to a 10054 error. It looks like the user used to run these jobs (condor-reuse-slot1_XX) cannot be created, thus resulting in permission errors. Windows usernames appear to have a limit of 20chars, which looks like it's causing the 21-character condor username to fail (condor-reuse-slot1_X is OK but condor-reuse-slot1_XX is not).

The current workaround we've employed is creating 4 partitionable slots, each with 25% share of resources. Of course this means that the maximum amount of ram etc. that any single job can use is more limited than it would be using a single partitionable slot. Note, this is not a problem on our linux machines.

Is this a known bug? Is there a better solution/workaroundi

Thanks
Alex

--
Alex M. Chubaty, PhD
Postdoctoral Researcher | Stagiaire postdoctoral (recherche)

Canadian Forest Service
Pacific Forestry Centre
506 Burnside Road W
Victoria, BC  V8Z 1M5

Università Laval
Facultà de foresterie, de gÃographie et de gÃomatique
DÃpartement des sciences du bois et de la forÃt

phone: +1.250.298.2347
email: achubaty@xxxxxxxxxxx

http://alexchubaty.com