[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Interactive jobs



On Tue, 28 Feb 2006, john.li@xxxxxxxxxxxxx wrote:

I took  UWCS default values out of the following expressions.

WANT_SUSPEND            =
WANT_VACATE             =
START                   =
SUSPEND                 =

I then use the command, condor_reconfig -all, to notifiy all masters for the
new changes.   (Do I need to restart master on the centrel manager host?)

No. You only need to reconfig the startd's on the execute hosts. So, unless your central manager is also an execute host (i.e. it also runs jobs) then you don't need to do a reconfig on it.

Unfortunately, no difference.   Xterm still goes away after a few seconds.
The StarterLog file shows the same message about suspending all jobs.

It may well be that the condor_reconfig -all has had no effect. From which host did you issue the condor_reconfig command? condor_reconfig will only be accepted from a host that has been granted the administrator's level of access (HOSTALLOW_ADMINISTRATOR); by default the only such machine is your central manager.

If it suceeded you should see a message like "Reconfiguring all running daemons." in the MasterLog of all your execute hosts.

Do I need to assign some value for the SUSPEND expression?

I *think* that if you leave any of the expressions above, like START, blank then the Condor startd will not start up with an error message like "Required attribute START is not defined". That would be bad... :(

You should try the following:

WANT_SUSPEND		= False
WANT_VACATE		= False
START			= True
SUSPEND			= False
PREEMPT			= False

(or, if you have left the TESTINGMODE_* expressions as supplied in the
 condor_config file created when you installed Condor, you can use:

WANT_SUSPEND		= $(TESTINGMODE_WANT_SUSPEND)
WANT_VACATE		= $(TESTINGMODE_WANT_VACATE)
START			= $(TESTINGMODE_START)
SUSPEND			= $(TESTINGMODE_SUSPEND)
PREEMPT			= $(TESTINGMODE_PREEMPT)

...which evaluates to the same thing.)

Once you've set those settings, then issue a condor_reconfig -all from your central manager, and then try your xterm job again.

The above settings mean that any job whose requirements match the execute host will run to completion without being suspended or preempted [1], regardless of things like keyboard activity, etc. on the execute host.

[1] Actually, this only prevents preemption due to activity on the execute
    host.  If you want to *completely* disable preemption (which I doubt
    is needed to get your "interactive" xterm job working) see:
	http://www.cs.wisc.edu/condor/manual/v6.6.10/3_6Startd_Policy.html#SECTION00469500000000000000

Hope that helps,

	Bruce

--
Bruce Beckles,
e-Science Specialist,
University of Cambridge Computing Service.