[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_submit feature request



Steffen Grunewald wrote:
I also said that I wanted to fill "gaps". Process number math introduces
another source of typos, and would have to be adjusted each time.
[snip]

While that adds another opportunity to make things more readable, it also
adds another opportunity to forget the maths in a single place, and overwrite the output of another already finished job.


I guess I am not really following...

You asked for
What about an extension of the "Queue" syntax to
	Queue <number_of_jobs> <starting_process_number> <step_width>
which would be close to the "seq" syntax?

In the above, all you would have to enter is three numbers, the <number_of_jobs>, <starting_process_number>, and <step_width>, right?

So a condor_submit file to do exactly this could look like the below. No math to adjust each time, no increased chance of overwriting output of another already finished job, etc. Just plug-n-chug your three numbers and go, no need to do anything else.

Sample submit file:

............
#
# ENTER YOUR THREE NUMBERS HERE
#
number_of_jobs = 100
starting_process_number = 5000
step_width = 10
#
# DETAILS ABOUT THE JOB.
# Note: We always use $(Seq) instead of $(Process)
#
universe = vanilla
arguments = -seed $(Seq)
log = whatup.$(Seq).log
output = foo.$(Seq).out
error = foo.$(Seq).err
#
# FINALLY, NEVER A NEED TO CHANGE THE "SEQ" MACRO BELOW
#
Seq = $$([ $(starting_process_number) + ($(Process) * $(step_width)) ])
queue $(number_of_jobs)
...............

You could even put the definition of a "Seq" expression into your condor_config file and ask Condor to automatically insert it into every submit file (via SUBMIT_EXPRS).

Perhaps I wasn't clear that you can completely divorce the Id used in your job's arguments / pathnames from the job id assigned by Condor. Or perhaps I am still missing something....

What I was initially looking for was a way to (re-)run job # n from a job cluster, perserving its job ID. To what extent does Condor rely on
consecutive job ids within a cluster?


Condor doesn't rely on consecutive job ids per se. However, there are advantages in the schedd daemon (the scheduler itself) assigning the job ids, and not condor_submit (the client). Thus it would be difficult for an end user (who only has access/control over the client) to specify how job ids are selected. Keeping job id selection at the server and consistent enables techniques like efficiently spreading a large job load out over multiple schedds/machines, for example.

What I was initially looking for was a way to (re-)run job # n from a job cluster, perserving its job ID. To what extent does Condor rely on
consecutive job ids within a cluster?

Do you still feel you need this capability, given the above example?

The only way I can think off the top of my head to do this is by using the on_exit_remove expression (see the documentation for condor_submit)....

best
Todd