[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] job's id: cluster.process

waka jawaka wrote:
Hi, I was wonderring what does the cluster part of the id mean,

It is simply a handle to a group of similar jobs.  Because all jobs
within a cluster must (a) share the same executable and (b) be submitted
within the same transaction, Condor is able to store the job information
more efficiently.  Thus 20,000 jobs submitted in one cluster will use
less system resources than 20,000 jobs submitted in 20,000 separate

and how is it selected (randomly, cyclic or any other computation).

The cluster id is selected by the condor_schedd; it is just a
monotonically increasing number that starts out at 1 and subsequently increases by 1. Unfortunately, I
think it will wrap back to 0 once it goes above 4 billion -- hopefully
that is not a concern.  :).

Also if you submit a job A and then submit another one B , can a
situation where A and B will have the same cluster in their id and
simply succeeding process numbers accure? (example A:124.0 b:124.1)

No, because the submission of jobs into a cluster must happen within the
same transaction.  When using the command-line tools, this means that
each invocation of condor_submit will result in a new cluster number.
Submission of multiple jobs into one cluster can be done with
condor_submit via multiple "queue" statements or "queue n" (where n > 1)
statements within one submit file.

If you know of a link to a detailed explanation about this subject,
I'll also appreciate it.

Hmmm, found this on the condor_submit man page:

condor_ submit requires a submit description file which contains
commands to direct the queuing of jobs. One submit description file
may contain specifications for the queuing of many Condor jobs at
once. A single invocation of condor_ submit may cause one or more
clusters. A cluster is a set of jobs specified in the submit
description file between queue commands for which the executable is
not changed. It is advantageous to submit multiple jobs as a single
cluster because:

* Only one copy of the checkpoint file is needed to represent all
jobs in a cluster until they begin execution.
> * There is much less
overhead involved for Condor to start the next job in a cluster than
for Condor to start a new cluster. This can make a big difference
when submitting lots of short jobs.

Multiple clusters may be specified within a single submit description
file. Each cluster must specify a single executable.

hope this helps,