[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Automatically detecting job completion andfiletransfer



I've answered my own question (having RTFM ;-) ...)

If I add:

should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = <list of input files>

to the submit file, this forces Condor to use its own file transfer
mechanism (bypassing NFS) and hence I can be sure that the files have been
transferred back to the submit node before the "005" event appears in the
log.  It seems that my problem was indeed caused by NFS caching as Erik
suggested.

Thanks to everyone who helped,
Jon

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Jon Blower
> Sent: 12 April 2006 15:07
> To: 'Condor-Users Mail List'
> Subject: Re: [Condor-users] Automatically detecting job 
> completion andfiletransfer
> 
> 
> > Your submit file above isn't using file transfer - are you 
> using NFS?
> > NFS caching can cause the weird ordering you describe.
> 
> I think I am (I haven't set up the Condor pool myself, I'm 
> using one at my
> institution) so it sounds likely that this is the issue, 
> thanks.  Can I force Condor to bypass this and transfer all 
> the files (including stdout and
> stderr) to the submit host before marking the job complete?
> 
> I understand that I can use "transfer_output_files = file1 
> file2" but does this also work for stdout and stderr?
> 
> Thanks, Jon
> 
> > -----Original Message-----
> > From: condor-users-bounces@xxxxxxxxxxx 
> > [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
> > Sent: 12 April 2006 14:55
> > To: Condor-Users Mail List
> > Subject: Re: [Condor-users] Automatically detecting job 
> completion and 
> > filetransfer
> > 
> > On Wed, Apr 12, 2006 at 11:03:49AM +0100, Jon Blower wrote:
> > > Dear all,
> > > 
> > > This question has probably been asked before but I haven't
> > been able
> > > to find an answer on Google or the mailing list archives.  
> > I'm writing
> > > a Java program that submits jobs to a Condor pool.  The
> > Java program
> > > runs on a submit host and generates job description files
> > that look like this:
> > > 
> > > executable = /home/jon/bin/helloworld universe = vanilla input = 
> > > stdin output = stdout error = stderr log = condor.log 
> initialdir = 
> > > /some/directory Queue
> > > 
> > > I submit the job by calling condor_submit from Java's
> > Runtime.exec() method.
> > > This bit works fine.  
> > > 
> > > My problem is detecting categorically when the job has
> > completed *and*
> > > the output files (stdout and stderr) have been transferred
> > back to the
> > > submit host.  My first stab at the Java program detects the
> > status of
> > > the job ("submitted", "running", "complete") by parsing the
> > log file
> > > that is produced.  It also gets the exit code of the
> > executable from this log file.
> > > 
> > > To detect job completion, my program looks for the "005" 
> event ("Job
> > > terminated") in the log file.  However, it seems that 
> this event is 
> > > sent to the log file *before* the contents of the stdout 
> and stderr 
> > > files are transferred to the submit host.  If I check the 
> length of 
> > > the stdout and stderr files on the submit host (using the 
> length() 
> > > method of java.io.File) they both report zero immediately 
> after the 
> > > "005" event is detected in the log file.  If I wait a few
> > seconds, the
> > > length() method reports the correct length, indicating that these 
> > > files (or at least their contents) are transferred a few
> > seconds after the "005" event.
> > > 
> > 
> > Your submit file above isn't using file transfer - are you 
> using NFS?
> > NFS caching can cause the weird ordering you describe.
> > 
> > -Erik
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>