[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] submitting jobs with API



This is what was logged in SchedLog in the submit attempt. Note I have
these security related settings in my config file. Do I need other
settings to allow this to work?

use SECURITY : HOST_BASED
ALLOW_WRITE = 192.168.*
ALLOW_READ = 192.168.*


12/19/17 11:13:13 (pid:32123) authenticate_self_gss: acquiring self
credentials failed. Please check your Condor configuration file if
this is a server process. Or the user environment variable if this is
a user process.

GSS Major Status: General failure
GSS Minor Status Error Chain:
globus_gsi_gssapi: Error with GSI credential
globus_gsi_gssapi: Error with gss credential handle
globus_credential: Valid credentials could not be found in any of the
possible locations specified by the credential search order.
Valid credentials could not be found in any of the possible locations
specified by the credential search order.
Attempt 1
globus_credential: Error reading host credential
globus_sysconfig: Could not find a valid certificate file: The host
cert could not be found in:
1) env. var. X509_USER_CERT
2) /etc/grid-security/hostcert.pem
3) $GLOBUS_LOCATION/etc/hostcert.pem
4) $HOME/.globus/hostcert.pem

The host key could not be found in:
1) env. var. X509_USER_KEY
2) /etc/grid-security/hostkey.pem
3) $GLOBUS_LOCATION/etc/hostkey.pem
4) $HOME/.globus/hostkey.pem


Attempt 2
globus_credential: Error reading proxy credential
globus_sysconfig: Could not find a valid proxy certificate file location
globus_sysconfig: Error with key filename
globus_sysconfig: File does not exist: /tmp/x509up_u0 is not a valid file
Attempt 3
globus_credential: Error reading user credential
globus_sysconfig: Error with certificate filename: The user cert could
not be found in:
1) env. var. X509_USER_CERT
2) $HOME/.globus/usercert.pem
3) $HOME/.globus/usercred.p12



12/19/17 11:13:13 (pid:32123) DC_AUTHENTICATE: authentication of
<192.168.10.15:45684> did not result in a valid mapped user name,
which is required for this command (1112 QMGMT_WRITE_CMD), so
aborting.
12/19/17 11:13:13 (pid:32123) DC_AUTHENTICATE: reason for
authentication failure: AUTHENTICATE:1003:Failed to authenticate with
any method|AUTHENTICATE:1004:Failed to authenticate using
GSI|GSI:5003:Failed to authenticate.  Globus is reporting error
(851968:152).  There is probably a problem with your credentials.
(Did you run grid-proxy-init?)|AUTHENTICATE:1004:Failed to
authenticate using KERBEROS|AUTHENTICATE:1004:Failed to authenticate
using FS|FS:1004:Unable to lstat(/tmp/FS_XXX4oulm8)


On Tue, Dec 19, 2017 at 10:33 AM, Jason Patton <jpatton@xxxxxxxxxxx> wrote:
> I don't have a solution, but hopefully I can help get the ball rolling.
> Without modifying my schedd config, I tried doing a remote submit following
> the same steps, which failed with the same error. The error is a little
> misleading/light on details, it's likely an authentication problem from not
> being on the same system as the schedd. Doing essentially the same thing
> using the client tools gives more info:
>
>>>> schedd.submit(ad)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> RuntimeError: Failed to connect to schedd.
>
> $ condor_submit test.submit -remote condor-el7.test
> Submitting job(s)
> ERROR: Failed to connect to queue manager condor-el7.test
> AUTHENTICATE:1003:Failed to authenticate with any method
> AUTHENTICATE:1004:Failed to authenticate using GSI
> GSI:5003:Failed to authenticate.  Globus is reporting error (851968:50).
> There is probably a problem with your credentials.  (Did you run
> grid-proxy-init?)
> AUTHENTICATE:1004:Failed to authenticate using KERBEROS
> AUTHENTICATE:1004:Failed to authenticate using FS
>
> You should see more details in SchedLog on your submit host.
>
> Hopefully someone more knowledgable about setting up the schedd to accept
> remote job submissions can chime in. (ENABLE_SOAP and ENABLE_WEB_SERVER are
> probably not needed.)
>
> Jason
>
> On Tue, Dec 19, 2017 at 9:02 AM, Larry Martell <larry.martell@xxxxxxxxx>
> wrote:
>>
>> On Tue, Dec 19, 2017 at 9:29 AM, Larry Martell <larry.martell@xxxxxxxxx>
>> wrote:
>> > I am doing this:
>> >
>> > import htcondor
>> > import classad
>> > condor_host = '192.168.10.2'
>> > coll = htcondor.Collector(condor_host)
>> > schedd_ad = coll.locate(htcondor.DaemonTypes.Schedd)
>> > schedd = htcondor.Schedd(schedd_ad)
>> > ad = classad.ClassAd()
>> >
>> > # set up ad
>> >
>> > id = schedd.submit(ad)
>> >
>> > RuntimeError: 'Failed to connect to schedd.'
>> >
>> > On 192.168.10.2:
>> >
>> > 4 S condor     32054       1  0  80   0 - 18610 poll_s Dec12 ?
>> > 00:00:15 /usr/sbin/condor_master -f
>> > 4 S root       32112   32054  0  80   0 -  6652 poll_s Dec12 ?
>> > 00:07:51 condor_procd -A /var/run/condor/procd_pipe -L
>> > /var/log/condor/ProcLog -R 1000000 -S 60 -C 986
>> > 4 S condor     32113   32054  0  80   0 - 13531 poll_s Dec12 ?
>> > 00:00:44 condor_shared_port -f
>> > 4 S condor     32117   32054  0  80   0 - 20511 poll_s Dec12 ?
>> > 00:07:46 condor_collector -f
>> > 4 S condor     32122   32054  0  80   0 - 15856 poll_s Dec12 ?
>> > 00:31:40 condor_negotiator -f
>> > 4 S condor     32123   32054  0  80   0 - 18808 poll_s Dec12 ?
>> > 00:00:31 condor_schedd -f
>> >
>> > From the machine running the python code:
>> >
>> > $ nmap -p 9618 192.168.10.2
>> >
>> > Starting Nmap 6.40 ( http://nmap.org ) at 2017-12-19 09:28 EST
>> > Nmap scan report for 192.168.10.2
>> > Host is up (0.00018s latency).
>> > PORT     STATE SERVICE
>> > 9618/tcp open  condor
>> >
>> > Am I doing something wrong or missing something?
>>
>> Also let me add I have these settings in the config file:
>>
>> ENABLE_SOAP = True
>> ENABLE_WEB_SERVER = True