[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] cluster number in shell script (sh)

Hello David
On 12/01/2010 04:09 AM, David J. Herzfeld wrote:
> I don't think any of the Condor folk would claim that $(cluster) is
> guaranteed to be unique (it will definitely NOT be unique across several
> submit hosts, which seems to be what you are asking above). The cluster
> id can also "reset" itself if the history file is modified or the condor
> spool directory is moved/deleted/modified.
This is a good point but for the moment I have only one submit host.
> I am not clear on what you mean here - are you talking about $(Cluster)
> or $(Process)?
> Correct me if I am wrong, but what I think you are asking for is:
>  1) The output/error files of each job (process) should be unique.
Yes - and for the moment this is the case using the PID of the shell
starting the cluster job - I use a temporary directory named $0.$$ which
expands to the shell name and shell pid - they are unique on the machine
launching the command as long as the shell is running (the OS will not
spawn another process with this PID)
>  2) The Condor log file for each group of jobs (clusters) should be
> unique (so that you can use condor_wait to wait until all process in the
> cluster are complete).
Yes - see above
>  3) Output/error files for each job (process) should go in sequential
> order (so that you can tell which came first) when combining into a
> single file.
No. The jobs from one cluster can be executed in parallel and the order
in not relevant.
>  4) You should be capable of combine the stderr, stdout files (in order)
> into a single "master log" file for each group of similar jobs (cluster).
I want a method to generate a final unique master log file for all the
output and error of the jobs from one cluster.
I know how to combine these files in one.
> Is this what you had in mind? Are there more requirements here?
The problem is to find that unique file name (to be unique at least for
a long period of time - at least several months) which holds also the
information of the order of submission of the clusters.
The best option for that I think will be the cluster number which
satisfies these conditions and is already generated by the
cluster_submit command.
> I have attached a simple shell script which creates a unique (and dated)
> Condor output, error and log file. I then combine all of the
> output/error files in order for a given cluster. Since the script will
> exit if it is unable to create the lockfile, it should work across
> multiple submit hosts provided you are using a network file system with
> atomic locking.

Thanks for the shell script. I have worked a similar one without the
lock file.

Your ideea to use the date as a unique identifier seems ok, even if this
not truly unique if the submissions come from different machines working
on the same project.

For the moment I will use the cluster number by parsing the output of
the condor_submit command, but I will add the date also :)