Disclaimer: idiot here ;-)

I've got a serious problem.

I was running my jobs for the last few days, until I accumulated 2 days of
run time (the "normal" time for such a task to finish) and today I decided
to check the size of the file being generated.
This morning, after running overnight, the file was 44 MB... After 2 days of
running it should have been close to the final size of 610 MB, so that was
my first shock.
Just checked again (the machine is currently in use by the Owner, so Condor
is not active) and the file is not there anymore.

I suspected that this morning when I checked the file size...
Instead of being suspended to resume later, my jobs are being killed for
some reason. Being a new starter with Condor probably I missed something.

A bit of background: the machines are all Windows (2K and XP), with the
central server on 2K. After little struggling I got the jobs running using
this .sub:

	# Submit 4 jobs of rtgen.exe to Condor
	Universe = vanilla
	Executable = rtgen.exe
	Arguments = ntlm alpha 1 7 $(Process) 9000 40000000 ncc
	Initialdir = E:/
	Transfer_input_files = libeay32.dll, charset.txt
	Should_transfer_files = YES
	When_to_transfer_output = ON_EXIT
	Nice_user = True
	Notification = Never
	Getenv = False
	Requirements = ( (OpSys == "WINNT50") || (OpSys == "WINNT51") )
	# later I've to try
	#Requirements = ( (OpSys == "WINNT50") || (OpSys == "WINNT51") ) &&
(VirtualMachineID == 1)
	# and
	#hold = True
	Queue 4

I'm pretty sure that my problem is not there, but in the condor_config file
on each node, most likely under Part 3, that I left exactly as installed by
the Windows GUI installer (I only modified bits in Parts 1 and 2, to make it

During installation using the GUI, I choose to suspend and continue later,
no migration.
What do I have to modify in condor_config (in the clients only? Or also the
central server?) to ensure that a job that has to run for 2 days of CPU
time, generating a file of 610 MB, is not killed when the owner is using the



