[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Status of nodes in a pool



Thanks a lot for your tips David!
 
In fact, I decided to clean up all jobs in idle state and test again.
 
I started up pool with 4 nodes:
 
=======================================================
[globus@cruzi log]$ condor_status
Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
crithidia       LINUX       INTEL  Unclaimed  Idle       0.000   503  0+00:18:39
cruzi         LINUX       INTEL  Unclaimed  Idle       0.030   501  0+00:25:05
genome        LINUX       INTEL  Unclaimed  Idle       0.000  1519  0+00:27:22
vivax         LINUX       INTEL  Unclaimed  Idle       0.000   757  0+00:20:07
                     Total Owner Claimed Unclaimed Matched Preempting Backfill
         INTEL/LINUX     4     0       0         4       0          0        0
               Total     4     0       0         4       0          0        0
=======================================================
 
After submit simple test (http://www.cs.wisc.edu/condor/tutorials/intl-grid-school-3/submit_first.html) in vivax node, I ran "condor_q -better-analyze" and I got the following:
 
=======================================================
[globus@vivax testcondor]$ condor_q -better-analyze

-- Submitter: vivax.wwwwww.www : <xxx.x.x.xxx:xxxx> : vivax.wwwwww.www
---
002.000:  Run analysis summary.  Of 4 machines,
      3 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      1 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
The Requirements _expression_ for your job is:
( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )
    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( TARGET.FileSystemDomain == "vivax.wwwwww.www" )
                                      1
2   ( target.Arch == "INTEL" )        4
3   ( target.OpSys == "LINUX" )       4
4   ( target.Disk >= 5 )              4
5   ( ( 1024 * target.Memory ) >= 5 ) 4
=======================================================
This means that I need to setup TARGET.FileSystemDomain to allow others nodes in pool beside submitter.
 
My question is: how to do that? My submit file doesn't contain this property - I've followed all steps from URL above.
 
This pool doesn't run over NFS - local config files of all nodes point to vivax as central manager. Note that I don't have a condor user either - just globus user, but it was added as super user in config file variable of all nodes ($CONDOR_HOME/etc/condor_config).
 
Thanks in advance for your help.
Regards,
              Fabiano.
 

"David A. Kotz" <dkotz@xxxxxxxxxxxxx> escreveu:
Fabiano Portella wrote:
> Thanks for this tip. I've found a commented NUM_CPUS variable in
> $CONDOR_HOME/etc/condor_config file and enable it with value equals 1.
>
> Seems that worked fine, because now there's just one entry in
> condor_status for each active machine in pool.
>
> My problem now is that a simple test takes a long time to run - in fact
> one test was done in 30 minutes, but the other took more than a hour and
> I decided to cancel it. So, I have some questions:
>
> 1. Is there any configuration to do to improve performance? In URL that
> I've followed to do the test it took 4 seconds to conclude.

To find out why jobs are not running well, you'll need to read through
the logfiles for the submit node and the execute node or nodes. You can
also specify a logfile in your submit description for the job. There's
no quick and easy answer to this. It will depend on why the job is
performing poorly. It may be that your machines are having keyboard or
mouse activity that put them in the Owner state and interrupt your jobs.
Look at the job log, the ShadowLog, and the StarterLog first. They
should indicate state transitions and possibly the reasons for them.


> 2. Which command should I type to cancel jobs in a pool?

condor_rm

http://www.cs.wisc.edu/condor/manual/v6.7/condor_rm.html


> 3. Which command should I type to stop condor properly in all machines
> in pool? I'm trying "condor_off -master" but although I've got the
> message "Sent kill message...", all Condor processes seems to be still
> alive.

condor_off should work, but it has to be run by an authorized user, such
as root or condor, or from a user on a machine where all users are
allowed admin rights.

>
> Regards,
> Fabiano.
>
> */Adam Thorn /* escreveu:
>
> On Wed, 5 Apr 2006, David A. Kotz wrote:
>
> > I have no answer to the file lock problem, but the question about
> VMs is
> > an easy one. Condor will create one virtual machine for each CPU
> > detected on your machine. Hyperthreaded Pentium IV processors will
> > alsow show up as two processors each. There is a setting in the
> > condor_config to ignore hyperthreading, but it has never had any
> effect
> > for me.
>
> COUNT_HYPERTHREAD_CPUS = FALSE
>
> Works fine for my P4s..
>
> Adam
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>
> ------------------------------------------------------------------------
> Abra sua conta no Yahoo! Mail
>
> - 1GB de espaço, alertas de e-mail no celular e anti-spam realmente eficaz.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users


Abra sua conta no Yahoo! Mail - 1GB de espaço, alertas de e-mail no celular e anti-spam realmente eficaz.