[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Account condor-reuse-slot1_10 creation failed! (err=2202)



Hi!

We are running a windows condor cluster configured with dynamic slots. Recently we added to the pool a new 16-cores machine and suddenly faced problems! Condor is unable to run more than 9 jobs on this new node! Here is what the StarterLog.slot1_10 is saying (the same with all slots upper than 10):

StarterLog.slot1_10
===========================
02/01/14 10:05:38 Communicating with shadow <###.###.###.###:61259>
02/01/14 10:05:38 Submitting machine is "###.###.###.###"
02/01/14 10:05:38 setting the orig job name in starter
02/01/14 10:05:38 setting the orig job iwd in starter
02/01/14 10:05:38 Account condor-reuse-slot1_10 creation failed! (err=2202)
02/01/14 10:05:38 update_psid() failed after account creation!
02/01/14 10:05:38 ERROR "Failed to create a user nobody" at line 610 in file c:\condor\execute\dir_29540\userdir\src\condor_utils\uids.cpp
02/01/14 10:05:38 ShutdownFast all jobs.
02/01/14 10:05:38 condor_read() failed: recv(fd=1460) returned -1, errno = 10054 , reading 5 bytes from <147.125.99.159:61298>.
02/01/14 10:05:38 IO: Failed to read packet header
02/01/14 10:05:38 Error disabling account condor-reuse-slot1_10 (INVALID PARAMETER)


The problem source is more or less clear. We are not using a "run_as_owner" mode and therefore condor creates a temporal account on the running node. The account name has a template "condor-reuse-slot<X>". Windows limits the account name to 20 characters and therefore the name "condor-reuse-slot1_10" cannot be created. This seems to be a bug in condor!

(In condor mail list there was already a similar question - https://www-auth.cs.wisc.edu/lists/htcondor-users/2012-July/msg00064.shtml... Unfortunately unanswered...) 

Any ideas how to proceed?

Thanks,
Alexey