Dear All, I am using a condor system running on Windows XP,
vanilla universe. The condor system terminates all jobs at 8.30 am, every
working day, I have to have the job terminate before then in order to transfer
intermediate job states saved by my job (my job saves auto recovery information
at intervals determined by me, it is independent of condor checkpoints). I had read through the mailing list and came across
this: http://lists.cs.wisc.edu/archive/condor-users/2004-July/msg00173.shtml So I wrote a code with a windows messaging queue to
trap the WM_CLOSE Win32 message, and polled this queue at suitable intervals to
set a pointer to gracefully kill my application. I tested this application and
it does gracefully kill itself ( an easy way is the X on the window in
Windows). When I send the job to the condor queue it works
fine, but at 8.30am the job gets evicted and no files are transferred, and the
job does remain in the queue and is again submitted, yet no files are
transferred back? The submission script is: universe = vanilla Requirements = (CSD_CONDOR_POOL == "MEBC")
&& (OpSys == "WINNT51") executable = hellotest.exe output = mdi.out errror = mdi.err transfer_input_files =
input.dat,iapn_c.dat,iapn_i.dat,iapn_m.dat,iapp_c.dat,iapp_i.dat,iapp_m.dat,rrelx.dat,rrely.dat,rrelz.dat should_transfer_files = YES when_to_transfer_output = ON_EXIT_OR_EVICT log = mdi.log notification = Error queue and a typical log is: 000 (074.000.000) 10/21 03:34:03 Job submitted from
host: <xxx.xxx.xx.xxx:1027> ... 001 (074.000.000) 10/21 03:34:13 Job executing on
host: <xxx.xxx.xxx.xx:1029> ... 006 (074.000.000) 10/21 03:34:21 Image size of job
updated: 10476 ... 006 (074.000.000) 10/21 03:54:21 Image size of job
updated: 11168 ... 006 (074.000.000) 10/21 04:14:21 Image size of job
updated: 11176 ... 004 (074.000.000) 10/21 08:31:00 Job was evicted.
(0) Job was not checkpointed.
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
0 - Run Bytes Sent By Job
1444002 - Run Bytes Received By Job ... 001 (074.000.000) 10/21 17:30:22 Job executing on
host: <xxx.xxx.xxx.xx:1029> ... 006 (074.000.000) 10/21 17:50:31 Image size of job
updated: 11168 ... 006 (074.000.000) 10/21 18:10:31 Image size of job
updated: 11176 ... 006 (074.000.000) 10/21 23:30:32 Image size of job
updated: 11180 ... 006 (074.000.000) 10/21 23:50:32 Image size of job
updated: 11188 ... 004 (074.000.000) 10/22 08:30:06 Job was evicted.
(0) Job was not checkpointed.
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
0 - Run Bytes Sent By Job
1444002 - Run Bytes Received By Job I am not the admin of the pool, so I can’t
change any settings as well, also the admin is not available at the moment. Any
help will be appreciated. PS basically I need intermediate files from my job to
be transferred everyday at 8.30am to my machine. Thank you, Alan Alan Arokiam, The Materials Modelling
Group, Materials Science and
Engineering, Department of Engineering, The Brownlow Hill, L69 3GH Tel: 44-(0)151-794-4671 |