[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Unable to submit jobs via the SOAP API



Hello,

I have been running into issues when trying to submit jobs to the Schedd daemon using the SOAP API. My first issue is similar to the original problem asked at the link below:
https://lists.cs.wisc.edu/archive/htcondor-users/2015-October/msg00123.shtml

condorSchedd.newJob() fails to create a new job. The scheduler logs show an unset owner:
Âpasswd_cache::cache_uid(): getpwnam("") failed: user not found
Â(1.0) Failed to find UID and GID for user . Cannot chown /path/to/spool/1/0/cluster0.proc0.subproc0 to user.

Despite the above log message, newJob() returns a successful response. The issue manifests later as an "Unknown cluster or job" error when condorSchedd.submit() is invoked.

I am probably missing something basic. What am I doing/missing that is causing this problem?

My setup is as follows:
* HTCondor version 8.7.7, built from github source in a Centos 7 docker container, with the -DWITH_CREAM:BOOL=FALSE -DWITH_GLOBUS=FALSEÂ-D_DEBUG:BOOL=TRUE cmake options

* Same Centos 7 docker container running all daemons as user "condor", with the following permissive configuration:
USE_SHARED_PORT = FALSE
SCHEDD_ARGS = -p 8080
ENABLE_SOAP = TRUE
ENABLE_WEB_SERVER = TRUE
WEB_ROOT_DIR=$(RELEASE_DIR)/lib/webservice
ALLOW_SOAP = */*
QUEUE_ALL_USERS_TRUSTED = TRUE
HOSTALLOW_WRITE = *
ALLOW_WRITE = *
ALL_DEBUG = D_FULLDEBUG

* Python client using suds:
from suds.client import Client
condor_schedd = Client("http://localhost:8080/condorSchedd.wsdl",
           Âlocation="http://localhost:8080")
transaction = condor_schedd.service.beginTransaction(300)
print "Transaction", transaction
cluster = condor_schedd.service.newCluster(transaction.transaction)
print "Cluster", cluster
job = condor_schedd.service.newJob(transaction.transaction, cluster.integer)
print "Job", job
# Prints:
# Job (IntAndStatus){
#Â Âstatus =Â
#Â Â Â (Status){
#Â Â Â Â Âcode = "SUCCESS"
#Â Â Â Â Âmessage = "MESSAGE-NULL"
#Â Â Â }
#Â Âinteger = 0
# }

jobAd = condor_schedd.service.createJobTemplate(cluster.integer,Â
     job.integer, "condor", 5, "/bin/sleep", "30", "")
result = condor_schedd.service.submit(transaction.transaction,
     Âcluster.integer, job.integer, jobAd.classAd[0])
print result
# Prints:
# (RequirementsAndStatus){
#Â Âstatus =Â
#Â Â Â (Status){
#Â Â Â Â Âcode = "UNKNOWNJOB"
#Â Â Â Â Âmessage = "Unknown cluster or job"
#Â Â Â }
#Â Ârequirements = None
# }

My sources:
http://research.cs.wisc.edu/htcondor/manual/current/6_1Web_Service.html
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=SoapWisdom
http://ben.versionzero.org/wiki/Condor_SOAP_Interface
https://spinningmatt.wordpress.com/2009/11/02/submitting-a-workflow-to-condor-via-soap-using-java/
https://lists.cs.wisc.edu/archive/htcondor-users/2012-November/msg00085.shtml
https://www.npmjs.com/package/soap-htcondor

--------------------
My attempt at debugging:

I tried to blindly follow the log messages to the source, without much knowledge of the condor internals :). I noticed that the dummy classAd passed into createJobSpoolDirectory()ÂÂfromÂcreateJobSpoolDirectory_PRIV_CONDOR() doesn't have the "Owner" attribute set, which results in an empty string here. Adding the following line inÂcreateJobSpoolDirectory_PRIV_CONDOR() before calling createJobSpoolDirectory() got me past the first issue, but I ran into another issue when invoking condorSchedd.commitTransaction() later:
dummy_ad.InsertAttr(ATTR_OWNER,get_condor_username());
Did the hack above circumvent some authorization feature or did I stumble into a bug?

Thanks!
Biruk