[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Appending file output for a vanilla job

Hi all,

I'm trying to do something that feels like it should be HTCondor 101, but I am failing to figure it out:

We have a python program running in the vanilla universe that generates looks like

while True:
   s = random_number_from( /dev/urandom )
   result = calculation_that_takes_about_ten_minutes( s )

The jobs are running on our OrangeGrid which consists of transient execute machines that have an average lifetime of 4 hours. We have

output = result.$(cluster).$(process)
stream_output = true

We then accumulate a bunch of results by cat-ing result.$(cluster).$(process) together. This works great while the jobs are running.

The problem is that if a job gets evicted by the execute machine and restarted, then the stdout file gets clobbered when the job starts back up again. We would just like to accumulate results from a bunch of jobs. The result files are simple enough that if the job got evicted while it was writing an ascii line to stdout, we can filter that out.

I cannot figure out how to prevent condor from clobbering stdout when the job is restarted. I also can't figure out how to stream to files that are not stdout or stderr. Writing to a specific file and using append_files won't work, as the code is python and not standard universe. The only solution I can come up with is to:

1. Add transfer_input_file = result.$(cluster).$(process) to my submit file,

2. Submit the job into the held state to get the $(cluster) number,

3. Touch a bunch of result.$(cluster).$(process) files so they exist and are zero bytes.

4. Have my program cat result.$(cluster).$(process) to stdout at startup

5. Write print(result) to stdout and have condor stream stdout.

It feels like there has to be an easier way of doing this. What's the obvious thing that I'm missing?



Duncan Brown                              Room 263-1, Physics Department
Charles Brightman Professor of Physics     Syracuse University, NY 13244
Physics Graduate Program Director     http://dabrown.expressions.syr.edu