[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Automatically detecting job completion and file transfer



On Wed, Apr 12, 2006 at 11:03:49AM +0100, Jon Blower wrote:
> Dear all,
> 
> This question has probably been asked before but I haven't been able to find
> an answer on Google or the mailing list archives.  I'm writing a Java
> program that submits jobs to a Condor pool.  The Java program runs on a
> submit host and generates job description files that look like this:
> 
> executable = /home/jon/bin/helloworld
> universe = vanilla
> input = stdin
> output = stdout
> error = stderr
> log = condor.log
> initialdir = /some/directory
> Queue
> 
> I submit the job by calling condor_submit from Java's Runtime.exec() method.
> This bit works fine.  
> 
> My problem is detecting categorically when the job has completed *and* the
> output files (stdout and stderr) have been transferred back to the submit
> host.  My first stab at the Java program detects the status of the job
> ("submitted", "running", "complete") by parsing the log file that is
> produced.  It also gets the exit code of the executable from this log file.
> 
> To detect job completion, my program looks for the "005" event ("Job
> terminated") in the log file.  However, it seems that this event is sent to
> the log file *before* the contents of the stdout and stderr files are
> transferred to the submit host.  If I check the length of the stdout and
> stderr files on the submit host (using the length() method of java.io.File)
> they both report zero immediately after the "005" event is detected in the
> log file.  If I wait a few seconds, the length() method reports the correct
> length, indicating that these files (or at least their contents) are
> transferred a few seconds after the "005" event.
> 

Your submit file above isn't using file transfer - are you using NFS?
NFS caching can cause the weird ordering you describe.

-Erik