[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Using a remote scheduler for job submission (condor_submit -remote)



Hi Javi,

with remote submit, there is no Shadow process running on your submit machine to keep track of the job. It all happens on the Schedd you submitted to. So your local Condor doesn't have to stay connected to the pool but also doesn't know when the job finishes. Thus, you have to tell it manually when to fetch the files.

Use condor_transfer_data to have Condor transfer all files as it would do with a regular job. It should look something like this:
$ condor_transfer_data -name condor02.hpc.com all
See the reference for more information [1].

If you encounter problems with Condor cleaning up your job before you had a chance to fetch it, you will have to tell it to stay in the queue until retrieved.
The documentation suggests adding this to the job
  leave_in_queue = (JobStatus == 4) && ((StageOutFinish =?= UNDEFINED) || (StageOutFinish == 0))
but it kept jobs indefinitely for me on account of StageOutFinish not being set even after the transfer. Might have been a problem on my side, though.

-Max

[1]
http://research.cs.wisc.edu/htcondor/manual/current/condor_transfer_data.html

On 03/27/2013 04:54 PM, Javi Roman wrote:
Hi Max,

The job was submitted to the remote scheduler with SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE in the scheduler host and the client configure files. Nevertheless the job is never returned, the jobs are held in Completed state for ever in the queue:

174.46  user01          3/27 16:36   0+00:00:57 C  0   1.0  montecarlo 1000000
174.47  user01          3/27 16:36   0+00:00:53 C  0   1.0  montecarlo 1000000
174.48  user01          3/27 16:36   0+00:00:57 C  0   1.0  montecarlo 1000000
174.49  user01          3/27 16:36   0+00:00:53 C  0   1.0  montecarlo 1000000

This is mi job ClassAd:

Universe   = vanilla
Executable = montecarlo
Arguments  = 10000000000
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = montecarlo
Log        = montecarlopi.log
Output     = montecarlopi.out
Error      = montecarlopi.error
Queue 50

The files Log, Output, Error are never sent back!

Javi Roman


On Wed, Mar 27, 2013 at 4:04 PM, Max Fischer <mfischer@xxxxxxxxxxxxxxxxxxxx> wrote:
Hi!

If you want to disable authentication, use
SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE
or
SEC_DEFAULT_AUTHENTICATION_METHODS = ANONYMOUS

Please use this ONLY during testing or in a private network. Do not set any component other then the schedds submitted to/from to this.

-Max



On 03/27/2013 03:24 PM, Javi Roman wrote:
Hi!

I'm trying to submit a job using a remote host (a remote condor_schedd), however I'm unable to do it.

I've been reading a lot of e-mails with similar issues, however I can not set the correct configurations in my HTCondor pool.

These are the key points in my current configuration. As you can see I trying to disable the authentication methods with "SEC_DEFAULT_NEGOTIATION = NEVER" in the daemons and the client host:

1. Central Manager 

host name: condor01.hpc.com

DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR
SEC_DEFAULT_NEGOTIATION= NEVER

2. Submission Host (only one scheduler host in the pool):

host name: condor02.hpc.com

DAEMON_LIST = MASTER, SCHEDD
SEC_DEFAULT_NEGOTIATION = NEVER

3. Any host in the pool:

for example, host name: condor03.hpc.com

DAEMON_LIST = MASTER, STARTD
SEC_DEFAULT_NEGOTIATION = NEVER

I'm trying submit a job from host "condor03.hpc.com" logged with user "user01":

$ condor_status -schedd -l | grep MyAddress
MyAddress = "<10.129.129.33:57108>"

$ condor_submit montecarlopi.condor -remote condor02.hpc.com:57108 -debug
ERROR: Failed to connect to queue manager condor02.hpc.com:57108
AUTHENTICATE:1003:Failed to authenticate with any method
AUTHENTICATE:1004:Failed to authenticate using GSI
GSI:5003:Failed to authenticate.  Globus is reporting error (851968:45).  There is probably a problem with your credentials.  (Did you run grid-proxy-init?)
AUTHENTICATE:1004:Failed to authenticate using KERBEROS
AUTHENTICATE:1004:Failed to authenticate using FS

How I have to set the authentication methods properly?

On the other hand, I'm getting the scheduler port for the remote submission, Is this necessary?

Many thanks!



Javi Roman


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/