[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Jobs interruption in the middle of running cause end results to failed



Hello to all of you,

I have a situation here. I have a Condor Pool of 22 CPU’s running version 7.05, distributed in 4 Windows 2003 boxes (each win2k3 box has 4 cpu’s) and 3 Windows XP boxes (each winxp box has 2 cpu’s) . Jobs runs fine while there is only one user running his jobs at a time. I have to clarify that we haven’t set any kind of priorities (job or user) when the jobs are been sent. The issues presents when two users send their jobs at the same time or one after the other, being more specific if one user sends a jobs first and minutes later the other one send his jobs while the first ones are running. The Central manager puts the first  jobs on hold  to take the jobs of the second user to run with higher priority but once those jobs of the first users were placed on idle the integrity of the data is corrupted and from there the rest of jobs once they come back to running mode is wrong. The same happen with the jobs of the second user, during that process of putting them on idle and running the data is mishandle it and it will degenerate the results.

Is there any setting I need to configure at the condor_config level of the central manager to stipulate that once a jobs is running on one node it should not be interrupted and need to finish to its completion?

Thanks for your answer\input in advance!!!

 

 

Respectfully,

Alex Alas

Systems Administrator
Fugro EarthData Inc.

Tel. 301-948-8550 x219 Fax 301-963-2064 E-mail: aalas@xxxxxxxxxxxxx

7320 Executive Way, Frederick, MD  21704

Website: http://www.fugroearthdata.com