Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Appending file output for a vanilla job

Date: Thu, 04 Mar 2021 10:28:26 +0100
From: Thomas Hartmann <thomas.hartmann@xxxxxxx>
Subject: Re: [HTCondor-users] Appending file output for a vanilla job

Hi Duncan,

if the results are small enough, maybe you can use `condor_chirp` fromwithin the job to store/update the results as class ads? [1]Alternatively, with condor_chirp the job could probably send astatus/result file back or write its results into the job log (with a"grep'able" tag in the log, the results could maybe be harvested fromthe collected job logs)

If your jobs' workflows are somewhat complex, maybe they can be realizedas a DAG [2] - but that might be overkill for just a few simple jobs.


Cheers,
  Thomas


[1]
https://htcondor.readthedocs.io/en/latest/man-pages/condor_chirp.html?highlight=condor_chirp


[2]
https://htcondor.readthedocs.io/en/latest/users-manual/dagman-workflows.html#capturing-the-status-of-nodes-in-a-file


On 03/03/2021 22.50, Duncan Brown via HTCondor-users wrote:

Hi all,

I'm trying to do something that feels like it should be HTCondor 101, but I am failing to figure it out:

We have a python program running in the vanilla universe that generates looks like

while True:
    s = random_number_from( /dev/urandom )
    result = calculation_that_takes_about_ten_minutes( s )
    print(result)

The jobs are running on our OrangeGrid which consists of transient execute machines that have an average lifetime of 4 hours. We have

output = result.$(cluster).$(process)
stream_output = true

We then accumulate a bunch of results by cat-ing result.$(cluster).$(process) together. This works great while the jobs are running.

The problem is that if a job gets evicted by the execute machine and restarted, then the stdout file gets clobbered when the job starts back up again. We would just like to accumulate results from a bunch of jobs. The result files are simple enough that if the job got evicted while it was writing an ascii line to stdout, we can filter that out.

I cannot figure out how to prevent condor from clobbering stdout when the job is restarted. I also can't figure out how to stream to files that are not stdout or stderr. Writing to a specific file and using append_files won't work, as the code is python and not standard universe. The only solution I can come up with is to:

1. Add transfer_input_file = result.$(cluster).$(process) to my submit file,

2. Submit the job into the held state to get the $(cluster) number,

3. Touch a bunch of result.$(cluster).$(process) files so they exist and are zero bytes.

4. Have my program cat result.$(cluster).$(process) to stdout at startup

5. Write print(result) to stdout and have condor stream stdout.

It feels like there has to be an easier way of doing this. What's the obvious thing that I'm missing?

Cheers,
Duncan.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Follow-Ups:
- Re: [HTCondor-users] Appending file output for a vanilla job
  - From: Duncan Brown

References:
- [HTCondor-users] Appending file output for a vanilla job
  - From: Duncan Brown

Prev by Date: Re: [HTCondor-users] job attribute and startd
Next by Date: Re: [HTCondor-users] help needed to troubleshoot why suddenly an user is running less jobs than it used to
Previous by thread: [HTCondor-users] Appending file output for a vanilla job
Next by thread: Re: [HTCondor-users] Appending file output for a vanilla job
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Appending file output for a vanilla job