[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Failed to find UID and GID for user



Greetings,


I'm trying to submit a job via SOAP api but I've run into an issue whereÂcondor_schedd.service.submit() api call returns 'Unknwon cluster or job' error and the jobs don't get submitted.


I'm using the following Python code:


########Begin Code#########

uid = pwd.getpwuid(os.getuid())[0]
condor_schedd = Client(url="" href="http://localhost:8080/condorSchedd.wsdl" target="_blank" style="color:rgb(17,85,204)">http://localhost:8080/condorSchedd.wsdl', cache=None, location='http://localhost:8080')

transaction = condor_schedd.service.beginTransaction(60)
cluster = condor_schedd.service.newCluster(transaction.transaction)
job = condor_schedd.service.newJob(transaction.transaction, cluster.integer)
jobAd = condor_schedd.service.createJobTemplate(cluster.integer,Â
     job.integer, uid, 5, "/bin/sleep", "30", "")
result = condor_schedd.service.submit(transaction.transaction,
     Âcluster.integer, job.integer, jobAd.classAd)
condor_schedd.service.commitTransaction(transaction.transaction)
print result

########End Code#########


Here's the response that I get from theÂservice.submit() api call:

(RequirementsAndStatus){
 Âstatus =Â
   (Status){
    Âcode = "UNKNOWNJOB"
    Âmessage = "Unknown cluster or job"
   }
Â}


When job submission fails, I see the following errors/warnings in /var/log/condor/ScheddLog file:

02/07/18 19:51:47 (pid:33466) Not enforcing MAX_JOBS_PER_OWNER for submit without owner of cluster 157.
02/07/18 19:51:47 (pid:33466) passwd_cache::cache_uid(): getpwnam("") failed: user not found
02/07/18 19:51:47 (pid:33466) (157.0) Failed to find UID and GID for user . Cannot chown /var/lib/condor/spool/157/0/cluster157.proc0.subproc0 to user.


Here are the settings I set in condor_config.local:

ALLOW_SOAP = *
ENABLE_SOAP = True
ENABLE_WEB_SERVER = True
WEB_ROOT_DIR=/usr/lib/condor/webservice
USE_SHARED_PORT = FALSE
SCHEDD_ARGS = -p 8080
ALLOW_WRITE = *
HOSTALLOW_WRITE = *
HOSTALLOW_READ = *
QUEUE_ALL_USERS_TRUSTED = TRUE


Seems like this issue has came up in the past:

https://www-auth.cs.wisc.edu/lists/htcondor-users/2008-February/msg00139.shtml
https://lists.cs.wisc.edu/archive/htcondor-users/2015-October/msg00123.shtml
https://www-auth.cs.wisc.edu/lists/htcondor-users/2018-January/msg00151.shtml


Does anyone have any pointers on how to fix this problem ?Â

Thank you.