[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Spawned/Forked process--Runaway



Since you’re on windows you could just have you initial process create and become part of a job object

http://msdn.microsoft.com/en-us/library/ms684161%28v=vs.85%29.aspx

http://msdn.microsoft.com/en-us/library/ms684147%28v=vs.85%29.aspx

 

 

then set JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE

 

Here’s a little guide http://stackoverflow.com/questions/53208/how-do-i-automatically-destroy-child-processes-in-windows

 

This requires >win2k  (but that’s losing condor support anyway so shouldn’t be a problem J)

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Michael O'Donnell
Sent: 28 January 2011 02:00
To: Condor-Users Mail List
Subject: [Condor-users] Spawned/Forked process--Runaway

 

I believe the Condor team has worked on eliminating the issues related to Condor not being able to track forked processes on execute machines, but I seem to be having a similar issue when spawning processes with Python.

If I submit a job that runs a compiled python script and then remove this job via condor_rm <user>, a spawned process does not get removed from the execute machine. Because this is possible on cross-platfroms, this can occur on Windows or Linux (my pool is solely windows).

Has anyone else seen this? Obviously there are security ramifications to this, but it also causes other issues with file locks (if reading data from an NFS) and so forth. I write my programs so that the Condor job does not complete until the spawned process completes. I accomplish this by waiting for the spawned process to complete before moving to the next section of code that the submit file is executing on the execute machine. Once all the processing is completed, the jobs complete successfully. As long as the main program that executes on Condor (which is what Condor is tracking) finishes correctly, the spawned job is handled correctly. However, if I use condor_rm, the main program is removed but the spawned process is not.l

I am using subprocess.popen(), which can be referenced here:
http://docs.python.org/library/subprocess.html


So, first has anyone else run into this using python. Second, is there some way I can tell Condor to do specific things when a job is removed. I recall reading about a script that can be used on exits, but I could not find enough information to implement and I would need to look back over my notes for details.

thank you for your help and suggestions,
Michael

--------------

Gloucester Research Limited believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis for any claim, demand or cause for action.

The information in this email is intended only for the named recipient.  If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.

All messages sent to and from this email address will be logged by Gloucester Research Ltd and are subject to archival storage, monitoring, review and disclosure.

Gloucester Research Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.

Gloucester Research Limited is a company registered in England and Wales with company number 04267560.

--------------