[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job remains in idle (worked until I increased pool size)




I found a ticket posted on the Condor wiki, and my error sounds similar (https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1645). I am not sure why we are now seeing this problem, because we have been using 7.4.3 since it was released and we are still using it. The only change I made was adding 100 cores to the pool.

The ticket states there have been sporadic reports and nothing has been confirmed and therefore I believe it remains open.

Any ideas?

thanks,
mike




From: "Michael O'Donnell" <odonnellm@xxxxxxxx>
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Date: 02/09/2011 04:21 PM
Subject: Re: [Condor-users] Job remains in idle (worked until I        increased        pool size)
Sent by: condor-users-bounces@xxxxxxxxxxx






We added machines to the pool (about 70).


As for the second comment, we did not change anything and we require use
run_as_owner = true.

thanks

mike



From: Ian Chesal <ichesal@xxxxxxxxxxxxxxxxxx>
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Date: 02/09/2011 03:45 PM
Subject: Re: [Condor-users] Job remains in idle (worked until I increased        pool size)
Sent by: condor-users-bounces@xxxxxxxxxxx






Hi Michael,

On Wed, Feb 9, 2011 at 2:15 PM, Michael O'Donnell <
odonnellm@xxxxxxxx> wrote:

I have a pool of 200 cores with various OS for Windows machines. Yesterday, I expanded the pool from 100 cores to 200 cores and since than any jobs that I submit remain in the idle state.


How did you expand the cores? By doubling the slot counts on your machines or by adding new machines?

I ask because...
 

1   ( ( ( 1024 * target.Memory ) >= 4500 ) && ( ( 1024 * ceiling(ifThenElse(JobVMMemory isnt undefined,JobVMMemory,4.394

531250000000E+000)) ) >= 4500 ) )

                                     0                   REMOVE


That above statement says there are no machines with enough memory to run your jobs. If your slot count was doubled by doubling slots advertised by your pool's machines, then you've halved the memory allocated per slot by Condor and possible constrained yourself out of slots because of this.

 

2   ( target.HasWindowsRunAsOwner && ( target.LocalCredd is "CM" ) )

                                     0                   REMOVE


This one I cannot explain. Did you change submissions to use run_as_owner = true?

Regards,
- Ian
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:

https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/