Douglas, Thanks
for replying to my e-mail, I
started testing the pool again after the changes you suggested but now it looks
like some nodes are rejecting the jobs due to the requirements of the jobs.
This are jobs that I ran before the changes and they were no rejected by any of
the CPU nodes. Before there were 8 machines that were rejecting the job after
restarting the condor service on both only one accepted the jobs, leaving only
one with the condition of rejecting jobs. Can the changes I made be responsible
for this behavior? Is there a way to list the job requirement?
Should I issue a condor_reconfig –all, what I did after the changes was
to restart on each machine the condor service, starting from the negotiator
machine to all the condor startd machines, Thanks
in advance for your input, Sincerely Alex From:
condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On
Behalf Of Douglas Clayton Alex, At minimum, to simply disable preemption, set PREEMPT =
FALSE and RANK = 0 in the configuration for your startds,
and PREEMPTION_REQUIREMENTS = FALSE for your negotiator. That will
stop preempting based on user activity, better-ranked jobs, and user prio,
respectively. There is a helpful section explaining preemption in section
3.5 of the Condor manual at http://www.cs.wisc.edu/condor/manual/v7.2/3_5Policy_Configuration.html#sec:Disabling_Preemption Regards, Doug On Feb 19, 2009, at 10:33 AM, Alas, Alex [FEDI] wrote:
Hello
to all of you, I
have a situation here. I have a Condor Pool of 22 CPU’s running version
7.05, distributed in 4 Windows 2003 boxes (each win2k3 box has 4 cpu’s)
and 3 Windows XP boxes (each winxp box has 2 cpu’s) . Jobs runs fine
while there is only one user running his jobs at a time. I have to clarify that
we haven’t set any kind of priorities (job or user) when the jobs are
been sent. The issues presents when two users send their jobs at the same time
or one after the other, being more specific if one user sends a jobs first and
minutes later the other one send his jobs while the first ones are running. The
Central manager puts the first jobs on hold to take the jobs of the
second user to run with higher priority but once those jobs of the first users
were placed on idle the integrity of the data is corrupted and from there the
rest of jobs once they come back to running mode is wrong. The same happen with
the jobs of the second user, during that process of putting them on idle and
running the data is mishandle it and it will degenerate the results. Is
there any setting I need to configure at the condor_config level of the central
manager to stipulate that once a jobs is running on one node it should not be
interrupted and need to finish to its completion? Thanks
for your answer\input in advance!!! Respectfully, Alex Alas Systems Administrator Tel.
301-948-8550 x219 Fax 301-963-2064 E-mail: aalas@xxxxxxxxxxxxx 7320
Executive Way, Frederick, MD 21704 Website: http://www.fugroearthdata.com _______________________________________________ -- =================================== phone: 919.647.9648 |