[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job stays idle when using Dynamic Slots



The NegotiatorLog is pretty authoritative on the job not matching any slots.

Does condor_status -direct <node> show you a slot you'd expect the job
to run on?

Best,


matt

Colak Birol wrote:
> condor_q -better-analyze says
> 
> 215.000:  Run analysis summary.  Of 7 machines,
>       0 are rejected by your job's requirements
>       7 reject your job because of their own requirements
>       0 match but are serving users with a better priority in the pool
>       0 match but reject the job for unknown reasons
>       0 match but will not currently preempt their existing job
>       0 are available to run your job
>         No successful match recorded.
>         Last failed match: Thu Jun 18 13:32:33 2009
>         Reason for last match failure: no match found
> 
> WARNING:  Be advised:   Request 215.0 did not match any resource's 
> constraints
> 
> The logs say:
> 
> MatchLog:6/18 13:32:33 (fd:7) (pid:4735)       Rejected 215.0 
> group_a.[...]: no match found
> NegotiatorLog:6/18 13:32:33 (fd:7) (pid:4735)     Request 00215.00000:
> NegotiatorLog:6/18 13:32:33 (fd:7) (pid:4735)       Rejected 215.0 
> group_a.[...]: no match found
> SchedLog:6/18 13:32:33 (fd:13) (pid:4736) Job 215.0: is runnable
> SchedLog:6/18 13:32:33 (fd:13) (pid:4736) Sent job 215.0 (autocluster=0)
> SchedLog:6/18 13:32:33 (fd:13) (pid:4736) Job 215.0 rejected: no match found
> 
> I  still don't know why the job is not running.
> 
> Many thanks for helping,
> Birol
> 
> Matthew Farrellee wrote:
> 
>> Colak Birol wrote:
>>  
>>
>>> Hi,
>>>
>>> I am using Condor 7.2.2 on RHEL3. I have computing nodes with 2 dualcore 
>>> CPUs, so I configured one dynamic slot with 4 CPUs on each node:
>>>
>>> NUM_SLOTS = 1
>>> SLOT_TYPE_1 = cpus=4
>>> NUM_SLOTS_TYPE_1 = 1
>>> SLOT_TYPE_1_PARTITIONABLE = true
>>>
>>> When I submit jobs over the SOAP interface, adding following ClassAds to 
>>> the Job, it runs fine (the job gets running after some minutes on a 
>>> dynamically created slot).
>>>
>>> RequestCpus = 1
>>> RequestMemory = ceiling(ImageSize / 1024.000000)
>>> RequestDisk = DiskUsage
>>>
>>> But when I say RequestCpus = 2, the job stays idle forever. I assume it 
>>> has something to do with the AutoClusterAttrs, which is
>>>
>>> AutoClusterAttrs = 
>>> "JobUniverse,LastCheckpointPlatform,NumCkpts,RequestCpus,RequestMemory,RequestDisk,Requirements,NiceUser,ConcurrencyLimits"
>>>
>>> Anyone has similar problems or a solution?
>>>
>>> Best Regards,
>>> Birol
>>>    
>>>
>> The AutoClusterAttrs shouldn't make a difference.
>>
>> Does condor_q -better-analyze tell you anything useful?
>>
>> To get a deeper understanding of why the job is idle you can look at the
>> StartLog and SchedLog, both should reference the job id.
>>
>> Best,
>>
>>
>> matt
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at: 
>> https://lists.cs.wisc.edu/archive/condor-users/
>>  
>>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/