Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Defining an exit script for condor jobs

Date: Thu, 6 Oct 2005 15:51:40 -0500
From: Jaime Frey <jfrey@xxxxxxxxxxx>
Subject: Re: [Condor-users] Defining an exit script for condor jobs

On Oct 6, 2005, at 2:13 PM, Terrence Martin wrote:

I asked this question a couple months ago but I wanted to put it out
again because I did not follow up on the one response I got.
My question was whether it is possible to have a script run on job exit that can go beyond what the normal condor exit does in terms of cleaning up areas. This is important in the current Open Science Grid clusters I am working with since often user files are stored in temporary area that condor does not necessarily know about. It would be nice to have this area cleared on exit.
The answer I got was either use a wrapper or Dagman.
The first solution does not work, that is if I follow the rules for USER_JOB_WRAPPER in the condor documentation to not have the wrapper fork a child and only call exec. I can do that but it is not clear I should. What would be nice is that in addition to USER_JOB_WRAPPER there was a USER_JOB_EXIT_SCRIPT which could define a script that performs certain cleanup steps on job exit.

As far as DAGman, I am not sure how that would help. DAGman from the condor documentation is meta-scheduler that submits to condor. That sounds like it works on the outside between the user and condor. The grid software I work with is already thick with schedulers to condor and I cannot enforce what users make use of on that side. All I can control is my condor queue and my worker nodes. Admittedly my knowledge of dagman extends to what I read here http://www.cs.wisc.edu/condor/ dagman/ but it does not sound like what I am looking for.

I guess I have another option and try to be clever. Just before my user wrapper drops to the actual job I could start a monitoring process that watches for the job to exit and then try to cleanup. It would be simpler and probably less error prone if condor could just trigger a cleanup process though. This would also have to end up being an orphan process since the parent calls an exec right after it spawns the monitor.


I see a few options available, none ideal:

1) Have the USER_JOB_WRAPPER clean up the files of the previous job.

2) If you have any control on the submit side, you can set a post- script in the job ad that will be run after the job. You can use SUBMIT_EXPRS to add it automatically to all jobs.

3) Have the USER_JOB_WRAPPER fork the job instead of exec'ing it. We don't mention this in the manual because it can be tricky to get right. The script has to not exit before the job, exit with the same status as the job, and catch SIGTERM and forward it to the job. If you run any standard universe jobs, there are several more signals the script has to catch and forward to the job. There may be some other details, but but those are the ones I can think of.

+----------------------------------+---------------------------------+
|            Jaime Frey            |  Public Split on Whether        |
|        jfrey@xxxxxxxxxxx         |  Bush Is a Divider              |
|  http://www.cs.wisc.edu/~jfrey/  |         -- CNN Scrolling Banner |
+----------------------------------+---------------------------------+

Follow-Ups:
- Re: [Condor-users] Defining an exit script for condor jobs
  - From: Jaime Frey

References:
- [Condor-users] Defining an exit script for condor jobs
  - From: Terrence Martin

Prev by Date: Re: [Condor-users] Error initializing GAHP
Next by Date: Re: [Condor-users] 6.7.12 gridmanager crash
Previous by thread: Re: [Condor-users] Defining an exit script for condor jobs
Next by thread: Re: [Condor-users] Defining an exit script for condor jobs
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Defining an exit script for condor jobs