[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] how to terminate jobs automatically



Hi Matt,

Many thanks for that it has really clarified the
difficulty in getting this functionality. A few comments
below:

-ian.

--On 22 July 2004 10:53 +0100 Matt Hope <Matt.Hope@xxxxxxxx> wrote:

If your jobs will only ever terminate in response to a vacate caused at
the end of day or normally then trap the vacate signal and exit
immediately - this will be treated as vacate succeded.

I'm using a .bat file which writes to stdout at the moment as a simple example so I don't know how it should trap the vacate signal. Is there a way of doing this for .exe's ?

Then set your jobs up to transfer files on vacate.

I tried


transfer_files=ON_EXIT_OR_EVICT

before but had the same problem - perhaps I need the signal handler as well ?


This is not perfect since the job will remain in the queue.



Yes a bit of a pain - although it would be useful if I could get them to pick up from were they left off next time they run.

Alternatively have your jobs keep track of the time themseves (making
this time an additional argument perhaps) and have them kill themselves
(nicely if possible with a message to that effect) a minute or so before
condor would (to allow for clock differences)

Yeah I had thought of that but I'm wary of having users hard code this
kind of implementation detail in their apps in case we change things in the future.
A signal handler would be more future proof.


There are simple ways of doing this as well as extremely fast but
unpleasant ones ones if performance is really an issue with sufficient
granularity to hit a minute no probs...


On Sun Grid Engine, which is UNIX based it is possible to send the app
a "warning" signal to tell it to clear itself up before it sends the KILL signal.
Is there anything possible like on Condor/Windows.


What you describe is not really very easy with condor since there are
many reasons for jobs to be vacated from a machine so knowing that it is
due to the time is more the responsiblity of the job itself than condor...

Not to mention the question of what to do with jobs that ran for a while,
were vacated due to a better job then the night ends...

If you are running a vanilla (as opposed to standard) which you have to
be on windows and require the ouput irrespective of whether the job
completely succesfully or not the simplest solution is to write the
output you care about directly to the netork / database and deal with
restarts directly (again a central database for run counters etc).

In this way you can also layer vacate alike functionality in future by
serializing sufficient info to restart either at regular check points or
in response to the vacate signal.

Note that the above solution has some unpleasant security connotations
you may not be able to accept.

We're stuck with the vanilla universe so I guess this precludes having Condor connect the I/O to a shadow running on another machine. I wouldn't fancy writing my own version of this - plus I doubt we could live with the security problems.

A vacate-lite functionality for windows would be nice:
A seperate file was declared in the submit file which would be copied on
vacate to the shadow and then copied back to the new machine to allow
persistance of data without needing any hacky freely avail network shares
or transmission of db/filesystem passwords...

the user's app still needs to know about the file and manage it's own
persitance but with considerable security improvements...

Yes I agree. Very long running apps are likely to have some restart capabilty
built in anyway so the code changes should be small.



Matt



thanks again,


-ian.
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx]On Behalf Of Dr Ian C. Smith
Sent: 22 July 2004 10:28
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] how to terminate jobs automatically


Dear All,


I have a very simple problem but one that I cannot find
a mention of in the Condor docs. We have a pool of Condor
PCs running Win 2k under control of a Solaris master. The
pool should only be available outside office hours
(1730 - 0830). If any jobs are running at the start of the
day (0830) they should be terminated and any output
returned to the user.

How do I set this up in condor_config ?????

At the moment jobs go into the idle state at 0830 and
no output is returned to the user. If jobs finish
before this time everything is hunky dory.

many thanks,

-ian.

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users



*****************************************************************
Gloucester Research Limited believes the information
provided herein is reliable. While every care has been
taken to ensure accuracy, the information is furnished
to the recipients with no warranty as to the completeness
and accuracy of its contents and on condition that any
errors or omissions shall not be made the basis for any
claim, demand or cause for action.
*****************************************************************

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users



This is not perfect since the job will remain in the queue.

Alternatively have your jobs keep track of the time themseves (making
this time an additional argument perhaps) and have them kill themselves
(nicely if possible with a message to that effect) a minute or so before
condor would (to allow for clock differences)

There are simple ways of doing this as well as extremely fast but
unpleasant ones ones if performance is really an issue with sufficient
granularity to hit a minute no probs...

What you describe is not really very easy with condor since there are
many reasons for jobs to be vacated from a machine so knowing that it is
due to the time is more the responsiblity of the job itself than condor...

Not to mention the question of what to do with jobs that ran for a while,
were vacated due to a better job then the night ends...

If you are running a vanilla (as opposed to standard) which you have to
be on windows and require the ouput irrespective of whether the job
completely succesfully or not the simplest solution is to write the
output you care about directly to the netork / database and deal with
restarts directly (again a central database for run counters etc).

In this way you can also layer vacate alike functionality in future by
serializing sufficient info to restart either at regular check points or
in response to the vacate signal.

Note that the above solution has some unpleasant security connotations
you may not be able to accept.

A vacate-lite functionality for windows would be nice:
A seperate file was declared in the submit file which would be copied on
vacate to the shadow and then copied back to the new machine to allow
persistance of data without needing any hacky freely avail network shares
or transmission of db/filesystem passwords...

the user's app still needs to know about the file and manage it's own
persitance but with considerable security improvements...

Matt

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx]On Behalf Of Dr Ian C. Smith
Sent: 22 July 2004 10:28
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] how to terminate jobs automatically


Dear All,


I have a very simple problem but one that I cannot find
a mention of in the Condor docs. We have a pool of Condor
PCs running Win 2k under control of a Solaris master. The
pool should only be available outside office hours
(1730 - 0830). If any jobs are running at the start of the
day (0830) they should be terminated and any output
returned to the user.

How do I set this up in condor_config ?????

At the moment jobs go into the idle state at 0830 and
no output is returned to the user. If jobs finish
before this time everything is hunky dory.

many thanks,

-ian.

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users



*****************************************************************
Gloucester Research Limited believes the information
provided herein is reliable. While every care has been
taken to ensure accuracy, the information is furnished
to the recipients with no warranty as to the completeness
and accuracy of its contents and on condition that any
errors or omissions shall not be made the basis for any
claim, demand or cause for action.
*****************************************************************

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users