[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs interruption in the middle of running causeend results to failed



Douglas,

Thanks for your help,

Alex

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Douglas Clayton
Sent: Wednesday, February 25, 2009 11:04 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Jobs interruption in the middle of running causeend results to failed

 

Alex,

 

The changes to preemption should not have caused any difference in matchmaking between jobs and machines.  To find out why jobs are not matching, you can get the list of requirements for all jobs by running:

 

            condor_q -format "%s." ClusterId -format "%s: " ProcId -format " %s\n" Requirements

 

Unfortunately, condor_q -better-analyze (which gives more detailed information about why jobs are not matching) is not yet supported on Windows.

 

Also, restarting all machines does the equivalent of a condor_reconfig -all, so there is no reason to run that command after restarting. 

 

Good luck,

Doug

 

-- 

===================================
Douglas Clayton

phone: 919.647.9648

Cycle Computing, LLC
Leader in Condor Grid Solutions
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com

 

On Feb 20, 2009, at 5:18 PM, Alas, Alex [FEDI] wrote:



Douglas,

Thanks for replying to my e-mail,

I started testing the pool again after the changes you suggested but now it looks like some nodes are rejecting the jobs due to the requirements of the jobs. This are jobs that I ran before the changes and they were no rejected by any of the CPU nodes. Before there were 8 machines that were rejecting the job after restarting the condor service on both only one accepted the jobs, leaving only one with the condition of rejecting jobs. Can the changes I made be responsible for this behavior?   Is there a way to list the job requirement? Should I issue a condor_reconfig –all, what I did after the changes was to restart on each machine the condor service, starting from the negotiator machine to all the condor startd machines,

Thanks in advance for your input,

Sincerely

Alex

<image002.png>