[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] submitting jobs with API



Thanks very much for your reply. Yes, I did do a condor_reconfig after
the change.

Don't know if this is a factor or not, but in
/etc/condor/config.d/40root.config there is this:

ALLOW_WRITE = $(ALLOW_WRITE), 192.168.10.15, 192.168.10.17

192.168.10.15 is the machine I am running the python script on.

Sorry for what I am sure is a very basic question, but I am very new
to condor. In your condor_submit  command, what do I put in for
<schedd name> and
<collector name>?

Also I just noticed this from the logs:

Unable to lstat(/tmp/FS_XXX4oulm8)

Could this have something to do with it? (But why anyone would have a
permission issue in /tmp I do not know.)

Thanks again for any light you can shed on this issue.

On Tue, Dec 19, 2017 at 10:14 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
> Hi Larry,
>
> This is definitely an issue with the security subsystem, not the python API.  I suspect that you can reproduce it via the command line tools with something like:
>
> condor_submit -remote <schedd name> -pool <collector name> submit_file
>
> Sometimes it's a bit simpler to increase the logging via the CLI (the error messages don't always come back in a usable manner for the python API).
>
> If you can reproduce it with condor_submit, try:
>
> _condor_TOOL_DEBUG=D_SECURITY,D_FULLDEBUG condor_submit -debug -remote <schedd name> -pool <collector name> submit_file
>
> That should provide a full readout of the security handshake.
>
> The puzzling thing is that this line:
>
> use SECURITY : HOST_BASED
>
> in your server config (oh - did you do a condor_reconfig after the change?) should theoretically disable the attempts to do GSI-based security negotiation.  However, the logfiles clearly show it is being attempted.
>
> So -- this suggests something slightly wrong with the schedd configuration, but it's not clear what is wrong yet.
>
> Brian
>
>> On Dec 19, 2017, at 11:25 AM, Larry Martell <larry.martell@xxxxxxxxx> wrote:
>>
>> This is what was logged in SchedLog in the submit attempt. Note I have
>> these security related settings in my config file. Do I need other
>> settings to allow this to work?
>>
>> use SECURITY : HOST_BASED
>> ALLOW_WRITE = 192.168.*
>> ALLOW_READ = 192.168.*
>>
>>
>> 12/19/17 11:13:13 (pid:32123) authenticate_self_gss: acquiring self
>> credentials failed. Please check your Condor configuration file if
>> this is a server process. Or the user environment variable if this is
>> a user process.
>>
>> GSS Major Status: General failure
>> GSS Minor Status Error Chain:
>> globus_gsi_gssapi: Error with GSI credential
>> globus_gsi_gssapi: Error with gss credential handle
>> globus_credential: Valid credentials could not be found in any of the
>> possible locations specified by the credential search order.
>> Valid credentials could not be found in any of the possible locations
>> specified by the credential search order.
>> Attempt 1
>> globus_credential: Error reading host credential
>> globus_sysconfig: Could not find a valid certificate file: The host
>> cert could not be found in:
>> 1) env. var. X509_USER_CERT
>> 2) /etc/grid-security/hostcert.pem
>> 3) $GLOBUS_LOCATION/etc/hostcert.pem
>> 4) $HOME/.globus/hostcert.pem
>>
>> The host key could not be found in:
>> 1) env. var. X509_USER_KEY
>> 2) /etc/grid-security/hostkey.pem
>> 3) $GLOBUS_LOCATION/etc/hostkey.pem
>> 4) $HOME/.globus/hostkey.pem
>>
>>
>> Attempt 2
>> globus_credential: Error reading proxy credential
>> globus_sysconfig: Could not find a valid proxy certificate file location
>> globus_sysconfig: Error with key filename
>> globus_sysconfig: File does not exist: /tmp/x509up_u0 is not a valid file
>> Attempt 3
>> globus_credential: Error reading user credential
>> globus_sysconfig: Error with certificate filename: The user cert could
>> not be found in:
>> 1) env. var. X509_USER_CERT
>> 2) $HOME/.globus/usercert.pem
>> 3) $HOME/.globus/usercred.p12
>>
>>
>>
>> 12/19/17 11:13:13 (pid:32123) DC_AUTHENTICATE: authentication of
>> <192.168.10.15:45684> did not result in a valid mapped user name,
>> which is required for this command (1112 QMGMT_WRITE_CMD), so
>> aborting.
>> 12/19/17 11:13:13 (pid:32123) DC_AUTHENTICATE: reason for
>> authentication failure: AUTHENTICATE:1003:Failed to authenticate with
>> any method|AUTHENTICATE:1004:Failed to authenticate using
>> GSI|GSI:5003:Failed to authenticate.  Globus is reporting error
>> (851968:152).  There is probably a problem with your credentials.
>> (Did you run grid-proxy-init?)|AUTHENTICATE:1004:Failed to
>> authenticate using KERBEROS|AUTHENTICATE:1004:Failed to authenticate
>> using FS|FS:1004:Unable to lstat(/tmp/FS_XXX4oulm8)
>>
>>
>> On Tue, Dec 19, 2017 at 10:33 AM, Jason Patton <jpatton@xxxxxxxxxxx> wrote:
>>> I don't have a solution, but hopefully I can help get the ball rolling.
>>> Without modifying my schedd config, I tried doing a remote submit following
>>> the same steps, which failed with the same error. The error is a little
>>> misleading/light on details, it's likely an authentication problem from not
>>> being on the same system as the schedd. Doing essentially the same thing
>>> using the client tools gives more info:
>>>
>>>>>> schedd.submit(ad)
>>> Traceback (most recent call last):
>>>  File "<stdin>", line 1, in <module>
>>> RuntimeError: Failed to connect to schedd.
>>>
>>> $ condor_submit test.submit -remote condor-el7.test
>>> Submitting job(s)
>>> ERROR: Failed to connect to queue manager condor-el7.test
>>> AUTHENTICATE:1003:Failed to authenticate with any method
>>> AUTHENTICATE:1004:Failed to authenticate using GSI
>>> GSI:5003:Failed to authenticate.  Globus is reporting error (851968:50).
>>> There is probably a problem with your credentials.  (Did you run
>>> grid-proxy-init?)
>>> AUTHENTICATE:1004:Failed to authenticate using KERBEROS
>>> AUTHENTICATE:1004:Failed to authenticate using FS
>>>
>>> You should see more details in SchedLog on your submit host.
>>>
>>> Hopefully someone more knowledgable about setting up the schedd to accept
>>> remote job submissions can chime in. (ENABLE_SOAP and ENABLE_WEB_SERVER are
>>> probably not needed.)
>>>
>>> Jason
>>>
>>> On Tue, Dec 19, 2017 at 9:02 AM, Larry Martell <larry.martell@xxxxxxxxx>
>>> wrote:
>>>>
>>>> On Tue, Dec 19, 2017 at 9:29 AM, Larry Martell <larry.martell@xxxxxxxxx>
>>>> wrote:
>>>>> I am doing this:
>>>>>
>>>>> import htcondor
>>>>> import classad
>>>>> condor_host = '192.168.10.2'
>>>>> coll = htcondor.Collector(condor_host)
>>>>> schedd_ad = coll.locate(htcondor.DaemonTypes.Schedd)
>>>>> schedd = htcondor.Schedd(schedd_ad)
>>>>> ad = classad.ClassAd()
>>>>>
>>>>> # set up ad
>>>>>
>>>>> id = schedd.submit(ad)
>>>>>
>>>>> RuntimeError: 'Failed to connect to schedd.'
>>>>>
>>>>> On 192.168.10.2:
>>>>>
>>>>> 4 S condor     32054       1  0  80   0 - 18610 poll_s Dec12 ?
>>>>> 00:00:15 /usr/sbin/condor_master -f
>>>>> 4 S root       32112   32054  0  80   0 -  6652 poll_s Dec12 ?
>>>>> 00:07:51 condor_procd -A /var/run/condor/procd_pipe -L
>>>>> /var/log/condor/ProcLog -R 1000000 -S 60 -C 986
>>>>> 4 S condor     32113   32054  0  80   0 - 13531 poll_s Dec12 ?
>>>>> 00:00:44 condor_shared_port -f
>>>>> 4 S condor     32117   32054  0  80   0 - 20511 poll_s Dec12 ?
>>>>> 00:07:46 condor_collector -f
>>>>> 4 S condor     32122   32054  0  80   0 - 15856 poll_s Dec12 ?
>>>>> 00:31:40 condor_negotiator -f
>>>>> 4 S condor     32123   32054  0  80   0 - 18808 poll_s Dec12 ?
>>>>> 00:00:31 condor_schedd -f
>>>>>
>>>>> From the machine running the python code:
>>>>>
>>>>> $ nmap -p 9618 192.168.10.2
>>>>>
>>>>> Starting Nmap 6.40 ( http://nmap.org ) at 2017-12-19 09:28 EST
>>>>> Nmap scan report for 192.168.10.2
>>>>> Host is up (0.00018s latency).
>>>>> PORT     STATE SERVICE
>>>>> 9618/tcp open  condor
>>>>>
>>>>> Am I doing something wrong or missing something?
>>>>
>>>> Also let me add I have these settings in the config file:
>>>>
>>>> ENABLE_SOAP = True
>>>> ENABLE_WEB_SERVER = True
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/