Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] How To TroubleShoot Flocking

Date: Thu, 06 Jul 2006 09:25:59 -0500
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [Condor-users] How To TroubleShoot Flocking

By the way: the reference to "condor_exec.exe" is expected. This is thename Condor runs the user's executable as (i.e. argv[0]). Failure toexecute the job is most often the result of files not being accessiblefrom the execute node. I assume this is a vanilla universe job. Whatfile-transfer settings are you using?


--Dan

Kewley, J (John) wrote:

[don't treat below as gospel - I haven't flocked in a while so somethings may have

changed or I may have mis-spelled things]
There a few subtle things that can stop flocking working:
* set FLOCK_TO and FLOCK_FROM at both ends for a 2 way flock
* HOSTALLOW values may need to be changed to include these other machines

* If you have security enabled - then this might need to be made moreflexible

to include other authentication mechanisms
* Machines in other pool may be of a different ARCH or OpSys

* Your jobs may be setup to use a shared filestore (NFS for instance)which

isn't available from the other pool.
You can use
condor_config_val -pool NODE_NAME -name NODE_NAME val
where val is one of
hostallow_write, hostallow_read, flock_to, flock_from
to see what values are set for the different machines
But the more usual culprits are firewalls.
Are there any firewalls between the pools? (or is one pool behind a NAT)

Remember that for jobs to flock, every submit node needs to be able totalk to every execute nodeand vice versa over the fixed ports and upper port range, all overboth tcp and udp.

If that is not the case, you'll have to relax the firewalls or use GCB.
See also
http://www.allhands.org.uk/2005/proceedings/papers/431.pdf
for more info on firewalls in a Condor Pool
Cheers
JK

    -----Original Message-----
    *From:* condor-users-bounces@xxxxxxxxxxx
    [mailto:condor-users-bounces@xxxxxxxxxxx]*On Behalf Of *John Alberts
    *Sent:* Wednesday, July 05, 2006 8:41 PM
    *To:* Condor-Users Mail List
    *Subject:* [Condor-users] How To TroubleShoot Flocking

    Hi. I am trying to setup flocking between 2 condor pools. 1 pool I
    have complete control/access to, the other pool I can log in using
    ssh and submit jobs. The administrator of the other pool is
    currently on vacation and said he has configured flocking to/from
    our pool. I’m trying to test it, and it seems like flocking isn’t
    working.

    I was wondering how I can troubleshoot flocking to see what the
    culprit is. I already tried to submit a job whose requirements can
    only be fulfilled on the other pool. Condor_status –analyze
    <jobid> shows that all machines can’t meet the requirements. I
    have also run condor_status –pool <otherpoolname> and it properly
    displays all available machines on the other pool. I’m not sure
    what to check next.

    Note: There is a firewall between the pools and our network admin
    has already configured the firewall to allow traffic between pools.

    Thanks for any help.

    John Alberts
    Technical Assistant for EMS
    alberts@xxxxxxxxxxxxxxxxxx <mailto:alberts@xxxxxxxxxxxxxxxxxx>
    219-989-2083
    CLO 332
    http://public.xdi.org/=john.alberts

------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR

Follow-Ups:
- Re: [Condor-users] How To TroubleShoot Flocking
  - From: John Alberts

References:
- Re: [Condor-users] How To TroubleShoot Flocking
  - From: Kewley, J \(John\)

Prev by Date: Re: [Condor-users] Submitting to condor via globus
Next by Date: Re: [Condor-users] Submitting to condor via globus
Previous by thread: Re: [Condor-users] How To TroubleShoot Flocking
Next by thread: Re: [Condor-users] How To TroubleShoot Flocking
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] How To TroubleShoot Flocking