[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Monte Carlo simulation




Dear All,

a newbie question:

One of my user requires to run a simulation program on quantum mechanics.
He needs to run it for 10,000 times which is a normal standard requirement on this specific area. (Each simulation takes around 10 minutes of computer execution).

I converted the program for him and submitted it to a pool of Windows XP
machines in our student lab. The setup works fine and the program also works fine.

We planned to allow each sub-job to run for 100 simulation (i.e. 1000 minutes of running which is around 16 hours). After each run, the execution server will return an averaging value for the result based on this 100 runs.

However it comes out that the students just keep on logging in/out and
switching on/off the lab. PCs, and the we can never conduct a successful
run for a continuous execution of 16 hours.

I am now going to modify the program so that after EACH successful simulation, the result file will be overwritten, with the updated simulation count and the average values stored.
i.e.
   (updated simulation count)
   xxxxxxxxxx
   yyyyyyyyyy
   zzzzzzzzzz
   ...

My question is therefore:

Is there any way to specify in the Condor Sript file such that once the
execution server is powered off, the result file will be sent back and this particular sub-job will be terminated (i.e. not re-queued).

I have already prepared another program to consolidate all these partial
results from differ files.  The user just needs 10,000 simulation.
It is far more economical to start a new run, rather than keeping
on waiting the sub-job to be re-queued and re-executed.

Thanks in advance.

W.K. Kwan
Computer Centre
University of Hong Kong