[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] idling jobs



Thanks, Nick, for your reply.

I've run 'condor_q -better' on the farm as you suggested. I'm attaching
the output. I'm not an expert enough to understand it though. Can you
glean something from it?

Thanks in advance,
Daniel


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx on behalf of Nick LeRoy
Sent: Mon 4/9/2007 10:47 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] idling jobs

On Wed April 4 2007 11:19 am, Daniel Goldin wrote:
> Hi,
Hello,

> I have submitted 30 jobs to run on a farm with 30 nodes. The "submit"
> file looks like this:

<snip>
>
> I am the only user on the farm, but what I see is only 5-6 jobs are
> running simultaneously and the rest are idling. Can I reconfigure
> something so that all the jobs run simultaneously? Could it be a
> priority issue? (If it can be done, I'd like to do it non-intrusively,
> i.e. keep the running jobs running...)

There's not a lot of information here, and there could be quite a lot of
things going wrong.

First, have you waited at least one negotiation cycle (typically 5
minutes)? 
I'm assuming that these are all long running jobs (from your description
above).  Condor doesn't do particularly well when users submit a lot of
short
running jobs.  If that's not the case, then let's try a couple debugging
exercises:

1. Have you looked at the output of 'condor_status' to verify that all
of the
execute machines are reporting to the pool correctly, and that they're
all in
the unclaimed / idle state?

2. Have you tried running 'condor_queue -analyze' or (even
better) 'condor_queue -better' (better analyze) and looked through it's
output?

I'd start with the above two exercises...   If they don't help, give us
a
little more information to go on (like the output of condor_status and
condor_q or 'condor_q -ana').

Hope this helps

-Nick

--
           <<< Follow the white rabbit. >>>
 /`-_    Nicholas R. LeRoy               The Condor Project
{     }/ http://www.cs.wisc.edu/~nleroy  http://www.cs.wisc.edu/condor
 \    /  nleroy@xxxxxxxxxxx              The University of Wisconsin
 |_*_|   608-265-5761                    Department of Computer Sciences
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR



-- Submitter: smufarm.physics.smu.edu : <192.168.1.1:32780> : smufarm.physics.smu.edu
---
3731.000:  Run analysis summary.  Of 60 machines,
     56 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      4 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
	Last successful match: Mon Apr  9 16:00:38 2007

The Requirements expression for your job is:

( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( ( 1024 * target.Memory ) >= 658636 )4                    
2   ( target.Arch == "INTEL" )        60                   
3   ( target.OpSys == "LINUX" )       60                   
4   ( target.Disk >= 1 )              60                   
5   ( TARGET.FileSystemDomain == "physics.smu.edu" )
                                      60                   
---
3731.005:  Run analysis summary.  Of 60 machines,
     56 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      4 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
	Last successful match: Mon Apr  9 16:00:38 2007

The Requirements expression for your job is:

( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( ( 1024 * target.Memory ) >= 649132 )4                    
2   ( target.Arch == "INTEL" )        60                   
3   ( target.OpSys == "LINUX" )       60                   
4   ( target.Disk >= 1 )              60                   
5   ( TARGET.FileSystemDomain == "physics.smu.edu" )
                                      60                   
---
3731.006:  Run analysis summary.  Of 60 machines,
     56 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      4 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
	Last successful match: Mon Apr  9 16:00:38 2007

The Requirements expression for your job is:

( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( ( 1024 * target.Memory ) >= 644568 )4                    
2   ( target.Arch == "INTEL" )        60                   
3   ( target.OpSys == "LINUX" )       60                   
4   ( target.Disk >= 1 )              60                   
5   ( TARGET.FileSystemDomain == "physics.smu.edu" )
                                      60                   
---
3731.007:  Run analysis summary.  Of 60 machines,
     56 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      4 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
	Last successful match: Mon Apr  9 16:00:38 2007

The Requirements expression for your job is:

( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( ( 1024 * target.Memory ) >= 652028 )4                    
2   ( target.Arch == "INTEL" )        60                   
3   ( target.OpSys == "LINUX" )       60                   
4   ( target.Disk >= 1 )              60                   
5   ( TARGET.FileSystemDomain == "physics.smu.edu" )
                                      60                   
---
3731.008:  Run analysis summary.  Of 60 machines,
     56 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      4 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
	Last successful match: Wed Mar 28 13:12:50 2007
	Last failed match: Mon Apr  9 16:00:39 2007
	Reason for last match failure: no match found

The Requirements expression for your job is:

( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( ( 1024 * target.Memory ) >= 661720 )4                    
2   ( target.Arch == "INTEL" )        60                   
3   ( target.OpSys == "LINUX" )       60                   
4   ( target.Disk >= 1 )              60                   
5   ( TARGET.FileSystemDomain == "physics.smu.edu" )
                                      60                   
---
3731.009:  Run analysis summary.  Of 60 machines,
     56 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      4 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
	Last successful match: Wed Mar 28 15:07:04 2007
	Last failed match: Fri Mar 30 17:34:08 2007
	Reason for last match failure: no match found

The Requirements expression for your job is:

( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( ( 1024 * target.Memory ) >= 648188 )4                    
2   ( target.Arch == "INTEL" )        60                   
3   ( target.OpSys == "LINUX" )       60                   
4   ( target.Disk >= 1 )              60                   
5   ( TARGET.FileSystemDomain == "physics.smu.edu" )
                                      60                   
---
3731.014:  Request is being serviced

---
3812.000:  Request is being serviced