[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Maximizing running parallel jobs



Hi,
I still have the same problem, resources available and ready to use but condor run just 3 or 4 jobs simultaneously.
I did run the command "condor_q -better-analyze" but I didn’t manage to understand where is the problem, there is the result of that command for an idle job 

--------------------------------------------------------------------------------------------------------------------


condor@condor1:~$ condor_q -better-analyze 100.0
-- Submitter: condor1.localadmin : <172.16.110.71:9652> : condor1.localadmin
---
100.000: Run analysis summary. Of 20 machines,
  4 are rejected by your job's requirements
  12 reject your job because of their own requirements
  1 match but are serving users with a better priority in the pool
  3 match but reject the job for unknown reasons
  0 match but will not currently preempt their existing job
  0 are available to run your job
  Last successful match: Mon Aug 17 19:59:50 2009

The Requirements _expression_ for your job is:
( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( target.HasFileTransfer )

  Condition Machines Matched Suggestion
  --------- ---------------- ----------
1 ( target.HasFileTransfer ) 16  
2 ( target.Arch == "INTEL" ) 20  
3 ( target.OpSys == "LINUX" ) 20  
4 ( target.Disk >= 27500 ) 20  
5 ( ( 1024 * target.Memory ) >= 1000 )20  

The following attributes are missing from the job ClassAd:

CheckpointPlatform

-----------------------------------------------------------------------------------------------------------------

Please could you make a look on it, I hope that you can find from it, why condor doesn’t run jobs as soon as they arrive.
Thx a lot



--- En date de : Lun 17.8.09, Ian Chesal <ICHESAL@xxxxxxxxxx> a écrit :

De: Ian Chesal <ICHESAL@xxxxxxxxxx>
Objet: Re: [Condor-users] Maximizing running parallel jobs
À: "Condor-Users Mail List" <condor-users@xxxxxxxxxxx>
Date: Lundi 17 Août 2009, 2h38

>       I've installed Condor on 6pcs;
>            5 PCs quad cores, (2 pc of 8Go RAM, 1 of 4Go RAM and 2 of
3Go RAM)>
>            1 PC dual core 2 (3 Go RAM)
>
>       I got 22 slots, my parallel application is fragmented into more
than 22
> jobs (no dependency) so while running it I expected to see the 22
slots
> being simultaneously used being used to run tasks (each slot will be
running
> a task) but I noticed that only 3 or 4 slots are being used
simultaneously
> to run the set of tasks which makes my application run very slow.
>
>       Is there any configuration or recommendation that allows me to
solve this problem
>
>       My configuration:
>            Central Manager: PC quadcore, 8Go Ram; (manager and
submitter)
>            Workers: other PCs; (submitter and executer)
>            Central manager config file: http://meddeb.fr.tc

There needs to be a wiki with a section called "What to do if your job
isn't running" and at the end "How to ask for more help help when your
job isn't running". :D

Mohamed,

The first place to start is to ask your Condor system why one of the
jobs you think should be running, isn't. You can do that with:

condor_q -better-analyze <clusterid>.<jobid>

That command can shed a lot of light on what might be keeping your jobs
off some of your machines. If the output doesn't help clear things up,
post it here along with the submission ticket you used to put your jobs
in your system. And the the output from condor_status and we can
hopefully give you a little more help with things.

Warm regards,
- Ian




Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/