[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Transfering files in a Vanilla universe on the jobbeing killed.



some points to consider:

1) Do you die in time?
You must respond* to the WM_CLOSE within the constraints of the KILL
variable on the client
If KILL evaluates immediately to true them you may never exit in time...

2) are you a console app?
Horrific hacky way condor seems to get the WM_CLOSE to you. It
enumerates all the windows on it's 'screen' and sends a WM_CLOSE to
them. Therefore if you wish your app to receive this message you must
somewhere create a form.
Messy, unpleasant..but it works.

I would very much like this behaviour to be changed (since it is
possible from a processid to determine it's windows handle (which
everything has even if it doesn't have a window)) and from there send
it the message direct.

This may only be a problem if you use a script to fire up your
application - perhaps there is explicit logic in the startd to send
the message to it's initial child process but not any descendents...

It would appear few people use the windows signals (for example the
bug in dagman's windows signalling that went un-noticed for ages) on
vacation so the more feedback to this list of it working/not working
the better since it would appear to need some more documentation at
the least.

Matt

* this means your process exits.

On Sun, 24 Oct 2004 20:26:17 +0100, Alan Christy Arokiam
<alanca@xxxxxxxxxxxxxxx> wrote:
> 
> 
> 
> Dear All,
> 
> I am using a condor system running on Windows XP, vanilla universe. The
> condor system terminates all jobs at 8.30 am, every working day, I have to
> have the job terminate before then in order to transfer intermediate job
> states saved by my job (my job saves auto recovery information at intervals
> determined by me, it is independent of condor checkpoints).
> 
> I had read through the mailing list and came across this:
> 
>  
> 
> http://lists.cs.wisc.edu/archive/condor-users/2004-July/msg00173.shtml
> 
>  
> 
> So I wrote a code with a windows messaging queue to trap the WM_CLOSE Win32
> message, and polled this queue at suitable intervals to set a pointer to
> gracefully kill my application. I tested this application and it does
> gracefully kill itself ( an easy way is the X on the window in Windows).
> 
>  
> 
> When I send the job to the condor queue it works fine, but at 8.30am the job
> gets evicted and no files are transferred, and the job does remain in the
> queue and is again submitted, yet no files are transferred back?
> 
>  
> 
> The submission script is:
> 
>  
> 
> universe = vanilla
> 
> Requirements = (CSD_CONDOR_POOL == "MEBC") && (OpSys == "WINNT51")
> 
> executable = hellotest.exe
> 
> output = mdi.out
> 
> errror = mdi.err
> 
> transfer_input_files =
> input.dat,iapn_c.dat,iapn_i.dat,iapn_m.dat,iapp_c.dat,iapp_i.dat,iapp_m.dat,rrelx.dat,rrely.dat,rrelz.dat
> 
> should_transfer_files = YES
> 
> when_to_transfer_output = ON_EXIT_OR_EVICT
> 
> log = mdi.log
> 
> notification = Error
> 
> queue
> 
>  
> 
>  
> 
>  
> 
>  
> 
> and a typical log is:
> 
<snip>
> 
>  
> 
> I am not the admin of the pool, so I can't change any settings as well, also
> the admin is not available at the moment. Any help will be appreciated.
> 
>  
> 
> PS basically I need intermediate files from my job to be transferred
> everyday at 8.30am to my machine.
> 
>  
> 
> Thank you,
> 
> Alan
> 
>  
> 
>  
> 
> Alan Arokiam,
> 
> The Materials Modelling Group,
> 
> Materials Science and Engineering,
> 
> Department of Engineering,
> 
> The University of Liverpool,
> 
> Brownlow Hill,
> 
> Liverpool,
> 
> UK.
> 
> L69 3GH
> 
> Tel: 44-(0)151-794-4671
> 
>  
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> 
>