[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs will not run in another pool



I've tried the same sort of thing before with no success, and I know there was no communications issue. It seems that flocking negotiations do not work in this way. If you specify requirements that can be met only by machines in the other pool, Condor never seems to match them.

- dave


Kewley, J (John) wrote:
Do either/both of the pools have firewalls?
Do either of the submitting or the potentially executing nodes in the flocked pool have machine firewalls?
Are either pools on private networks?
All are worth checking before expecting flocking to work. Remember ALL submit nodes must have routes through any intervening firewalls to all potential execute nodes, even in another pool. Such routes include TCP and UDP access across a variety
of ports. For more details on this, checkout
http://www.allhands.org.uk/2005/proceedings/papers/431.pdf
If you are happy that there are no firewalls in the way, then you need to post results of
condor_q -anal
or the appropriate log files
Cheers JK

    -----Original Message-----
    *From:* condor-users-bounces@xxxxxxxxxxx
    [mailto:condor-users-bounces@xxxxxxxxxxx]*On Behalf Of *Junaid N.
    Sahibzada
    *Sent:* Wednesday, April 05, 2006 7:08 AM
    *To:* Condor-Users Mail List
    *Subject:* [Condor-users] Jobs will not run in another pool

    Hi,

    I flocked a job to another pool using the following cmd file

    executable      = HADAMeanFilter
    universe        = vanilla

    Requirements   =  HOSTNAME != caudate-nh.nsw.cmis.csiro.au
    should_transfer_files = YES
    when_to_transfer_output = ON_EXIT

    transfer_input_files =
    ResampledProstate7.out.nostripes.raw,HADAPatient007_DBR_256x349x348_CAutHomgnty.data.plot,
    HADAPatient007_DBR_256x349x348_CAutHomgnty
    .out.mhd

    notification    = COMPLETE
    notify_user     = Oscar.AcostaTamayo@xxxxxxxx
    output          = output.HADAMeanFilterI.vanilla.dynamic
    error           = error.HADAMeanFilterI.vanilla.dynamic
    log             = log.HADAMeanFilterI.vanilla.dynamic
    arguments       = -n ResampledProstate7.out.nostripes.raw -x  256 -y
    349 -z 348  -r 0 -s 8 -L 64 -q 0 -t 255 -K 2 -u 3 -v 3 -w 3 -o
    HADAPatient007_DBR_256x3
    49x348_CAutHomgnty -m 2 -X 20 -c 5 -A 1 -T 0.125 -E 0.0001 -I 0 -H 1
    -Z 0
    queue

    my job is stuck in the queue and is not executing although there is
    a load full of resources available.

    you can check that over here http://condorview.csiro.au/

    the reason i have placed the requirement is because i wanted to
    check flocking.

    so i have said that run on any machine except caudate-nh( which is
    the only machine in my pool right now. i have turned off the rest of
    them).

    any problems?



    *Junaid N. Sahibzada*
    *Cell # (+61) 404 998 494 *
    *International Student MSc Internetworking, UTS, Australia*
    *Bachelor of Information Technology, NUST, Pakistan*

    ------------------------------------------------------------------------
    Yahoo! Messenger with Voice. Make PC-to-Phone Calls
    <http://us.rd.yahoo.com/mail_us/taglines/postman1/*http://us.rd.yahoo.com/evt=39663/*http://voice.yahoo.com>
    to the US (and 30+ countries) for 2¢/min or less.


------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users