[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Automatically detecting job completion and filetransfer



> Your submit file above isn't using file transfer - are you using NFS?
> NFS caching can cause the weird ordering you describe.

I think I am (I haven't set up the Condor pool myself, I'm using one at my
institution) so it sounds likely that this is the issue, thanks.  Can I
force Condor to bypass this and transfer all the files (including stdout and
stderr) to the submit host before marking the job complete?

I understand that I can use "transfer_output_files = file1 file2" but does
this also work for stdout and stderr?

Thanks, Jon

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
> Sent: 12 April 2006 14:55
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] Automatically detecting job 
> completion and filetransfer
> 
> On Wed, Apr 12, 2006 at 11:03:49AM +0100, Jon Blower wrote:
> > Dear all,
> > 
> > This question has probably been asked before but I haven't 
> been able 
> > to find an answer on Google or the mailing list archives.  
> I'm writing 
> > a Java program that submits jobs to a Condor pool.  The 
> Java program 
> > runs on a submit host and generates job description files 
> that look like this:
> > 
> > executable = /home/jon/bin/helloworld
> > universe = vanilla
> > input = stdin
> > output = stdout
> > error = stderr
> > log = condor.log
> > initialdir = /some/directory
> > Queue
> > 
> > I submit the job by calling condor_submit from Java's 
> Runtime.exec() method.
> > This bit works fine.  
> > 
> > My problem is detecting categorically when the job has 
> completed *and* 
> > the output files (stdout and stderr) have been transferred 
> back to the 
> > submit host.  My first stab at the Java program detects the 
> status of 
> > the job ("submitted", "running", "complete") by parsing the 
> log file 
> > that is produced.  It also gets the exit code of the 
> executable from this log file.
> > 
> > To detect job completion, my program looks for the "005" event ("Job
> > terminated") in the log file.  However, it seems that this event is 
> > sent to the log file *before* the contents of the stdout and stderr 
> > files are transferred to the submit host.  If I check the length of 
> > the stdout and stderr files on the submit host (using the length() 
> > method of java.io.File) they both report zero immediately after the 
> > "005" event is detected in the log file.  If I wait a few 
> seconds, the 
> > length() method reports the correct length, indicating that these 
> > files (or at least their contents) are transferred a few 
> seconds after the "005" event.
> > 
> 
> Your submit file above isn't using file transfer - are you using NFS?
> NFS caching can cause the weird ordering you describe.
> 
> -Erik
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>