[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] windows xp log off kills jobs

I'm also confused by this. If a user submits a vanilla job
on a windows based machine, what would be the result if

a) logs off before the job runs remotely
b) logs off while the job is running remotely

Also how does this fit in with the DAGman were PRE or POST
scripts need to run on the submit host ?



----------------------------------- Dr Ian C. Smith, e-Science team, University of Liverpool Computing Services Department

--On 09 November 2004 14:24 +1100 bob@xxxxxxxxxxx wrote:


I have condor 6.6.6 installed on the student computer labs at my

The lan machines run windows XP and are part of a domain.

Jobs run perfectly provided we don't have to share the machine with a
student, because when a student logs off the machine, the condor jobs are

I have posted previously about an unexplained error 143 from the java vm.
Well, I have managed to track this down to occur when either a computer is
reset OR a user logs off.

Now to my understanding, when a user logs off, my job SHOULD keep running
in the background, but instead the job gets killed and according to
condor, has completed successfully, so no longer resides in the queue.

Here is my jobs log file:

000 (002.004.000) 11/09 13:31:36 Job submitted from host:
<> ...
001 (002.004.000) 11/09 13:33:11 Job executing on host:
<> ...
005 (002.004.000) 11/09 13:33:12 Job terminated.
	(1) Normal termination (return value 143)
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
	0  -  Run Bytes Sent By Job
	15646  -  Run Bytes Received By Job
	0  -  Total Bytes Sent By Job
	15646  -  Total Bytes Received By Job

The condor daemons run as a system service, but the actual job runs as condor-reuse-vm1. Now I'm guessing that when you log off, windows kills all processes that are not a system servce, hence my job is killed.

Is this correct operation? Is this a limitation in condor that users
cannot be loggin in and out of the machine all the time? Or is this just
some completly strange behaviour that should not be happening?

Is it something to do with the permissions of the condor-reuse-vm1 user? I
have tried it on two different setups, one on the uiversity lab machines,
which are fairly restricted security wise, and one on a complete stand
alone set of computers that has no security implications at all.

Any help would be highly regarded.

Condor-users mailing list