[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Status of nodes in a pool



Fabiano,

I must have missed this Friday.  I think you'll need to do two things:


##--------------------------------------------------------------------
##  Network filesystem parameters:
##--------------------------------------------------------------------
##  Do you want to use NFS for file access instead of remote system
##  calls?
USE_NFS                = False


and in the submit description file,


should_transfer_files = YES
when_to_transfer_output = ON_EXIT


See this part of the manual for more information:

http://www.cs.wisc.edu/condor/manual/v6.7/2_5Submitting_Job.html#SECTION00354000000000000000


Fabiano Portella wrote:
Hi Condor Folks!
I've submitted this message last Friday and I still didn't get any reply.
Any tip will be very appreciated!
Thanks in advance.
Regards,
Fabiano.

*/Fabiano Portella <fabiano_portella@xxxxxxxxxxxx>/* escreveu:

    Thanks a lot for your tips David!
In fact, I decided to clean up all jobs in idle state and test again. I started up pool with 4 nodes: =======================================================
    [globus@cruzi log]$ condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime crithidia LINUX INTEL Unclaimed Idle 0.000 503 0+00:18:39 cruzi LINUX INTEL Unclaimed Idle 0.030 501 0+00:25:05 genome LINUX INTEL Unclaimed Idle 0.000 1519 0+00:27:22 vivax LINUX INTEL Unclaimed Idle 0.000 757 0+00:20:07
                         Total Owner Claimed Unclaimed Matched
    Preempting Backfill
INTEL/LINUX 4 0 0 4 0 0 0 Total 4 0 0 4 0 0 0
    =======================================================
After submit simple test
    (http://www.cs.wisc.edu/condor/tutorials/intl-grid-school-3/submit_first.html)
    in vivax node, I ran "condor_q -better-analyze" and I got the following:
=======================================================
    [globus@vivax testcondor]$ condor_q -better-analyze

    -- Submitter: vivax.wwwwww.www : <xxx.x.x.xxx:xxxx> : vivax.wwwwww.www
    ---
    002.000:  Run analysis summary.  Of 4 machines,
          3 are rejected by your job's requirements
          0 reject your job because of their own requirements
          0 match but are serving users with a better priority in the pool
          1 match but reject the job for unknown reasons
          0 match but will not currently preempt their existing job
          0 are available to run your job
    The Requirements expression for your job is:
    ( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
    ( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >=
    ImageSize ) &&
    ( TARGET.FileSystemDomain == MY.FileSystemDomain )
        Condition                         Machines Matched    Suggestion
        ---------                         ----------------    ----------
    1   ( TARGET.FileSystemDomain == "vivax.wwwwww.www" )
                                          1
    2   ( target.Arch == "INTEL" )        4
    3   ( target.OpSys == "LINUX" )       4
    4   ( target.Disk >= 5 )              4
    5   ( ( 1024 * target.Memory ) >= 5 ) 4
    =======================================================
    This means that I need to setup TARGET.FileSystemDomain to allow
    others nodes in pool beside submitter.
My question is: how to do that? My submit file doesn't contain this
    property - I've followed all steps from URL above.
This pool doesn't run over NFS - local config files of all nodes
    point to vivax as central manager. Note that I don't have a condor
    user either - just globus user, but it was added as super user in
    config file variable of all nodes ($CONDOR_HOME/etc/condor_config).
Thanks in advance for your help.
    Regards,
                  Fabiano.
    */"David A. Kotz" <dkotz@xxxxxxxxxxxxx>/* escreveu:

        Fabiano Portella wrote:
         > Thanks for this tip. I've found a commented NUM_CPUS variable in
         > $CONDOR_HOME/etc/condor_config file and enable it with value
        equals 1.
         >
         > Seems that worked fine, because now there's just one entry in
         > condor_status for each active machine in pool.
         >
         > My problem now is that a simple test takes a long time to run
        - in fact
         > one test was done in 30 minutes, but the other took more than
        a hour and
         > I decided to cancel it. So, I have some questions:
         >
         > 1. Is there any configuration to do to improve performance?
        In URL that
         > I've followed to do the test it took 4 seconds to conclude.

        To find out why jobs are not running well, you'll need to read
        through
        the logfiles for the submit node and the execute node or nodes.
        You can
        also specify a logfile in your submit description for the job.
        There's
        no quick and easy answer to this. It will depend on why the job is
        performing poorly. It may be that your machines are having
        keyboard or
        mouse activity that put them in the Owner state and interrupt
        your jobs.
        Look at the job log, the ShadowLog, and the StarterLog first. They
        should indicate state transitions and possibly the reasons for them.


         > 2. Which command should I type to cancel jobs in a pool?

        condor_rm

        http://www.cs.wisc.edu/condor/manual/v6.7/condor_rm.html


         > 3. Which command should I type to stop condor properly in all
        machines
         > in pool? I'm trying "condor_off -master" but although I've
        got the
         > message "Sent kill message...", all Condor processes seems to
        be still
         > alive.

        condor_off should work, but it has to be run by an authorized
        user, such
        as root or condor, or from a user on a machine where all users are
        allowed admin rights.

         >
         > Regards,
         > Fabiano.
         >
         > */Adam Thorn /* escreveu:
         >
         > On Wed, 5 Apr 2006, David A. Kotz wrote:
         >
         > > I have no answer to the file lock problem, but the question
        about
         > VMs is
         > > an easy one. Condor will create one virtual machine for
        each CPU
         > > detected on your machine. Hyperthreaded Pentium IV
        processors will
         > > alsow show up as two processors each. There is a setting in the
         > > condor_config to ignore hyperthreading, but it has never
        had any
         > effect
         > > for me.
         >
         > COUNT_HYPERTHREAD_CPUS = FALSE
         >
         > Works fine for my P4s..
         >
         > Adam
         > _______________________________________________
         > Condor-users mailing list
         > Condor-users@xxxxxxxxxxx
         > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
         >
         >
         >
        ------------------------------------------------------------------------
         > Abra sua conta no Yahoo! Mail
         >
         > - 1GB de espaço, alertas de e-mail no celular e anti-spam
        realmente eficaz.
         >
         >
         >
        ------------------------------------------------------------------------
         >
         > _______________________________________________
         > Condor-users mailing list
         > Condor-users@xxxxxxxxxxx
         > https://lists.cs.wisc.edu/mailman/listinfo/condor-users

        _______________________________________________
        Condor-users mailing list
        Condor-users@xxxxxxxxxxx
        https://lists.cs.wisc.edu/mailman/listinfo/condor-users


    ------------------------------------------------------------------------
    Abra sua conta no Yahoo! Mail
    <http://us.rd.yahoo.com/mail/br/tagline/mail/*http://br.info.mail.yahoo.com/>
    - 1GB de espaço, alertas de e-mail no celular e anti-spam realmente
    eficaz. _______________________________________________
    Condor-users mailing list
    Condor-users@xxxxxxxxxxx
    https://lists.cs.wisc.edu/mailman/listinfo/condor-users


------------------------------------------------------------------------
Yahoo! Messenger com voz <http://us.rd.yahoo.com/mail/br/tagline/messenger/*http://br.messenger.yahoo.com/whatsnew.php> - Instale agora e faça ligações de graça.


------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users