[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Scilab in standart universe





Jean-Christophe BACCON wrote:
Hi all,

I have some users who want to use scilab to make some long simulations
(about 40 days). As our compute nodes are students computers, they want
to use the standart universe to checkpoint thoses simulations. But there
are some forbiden instruction in scilab code (sleep, fork,
threads, ...). So I want to know if somebody succeed in porting scilab
for the standart universe and if it is possible to get this version, or
you know another solution.

I am not familiar with scilab, but there are several possible options.

First, you can run your scilab jobs in the vanilla universe, and have your users write checkpointing code themselves. The user's code should write out the state of their simulation to a file periodically (maybe every 30 minutes ?). If you are transferring the working files, this checkpoint file can be sent back with the job when it is restarted by setting when_to_transfer_output to ON_EXIT_OR_EVICT in the submit file. The job should check for the existence of this file, and restart from the saved state. This is the best option, if you can do it.

A second option will work only if your user's scilab code doesn't actually execute any of the forbidden system calls. That is, if scilab only forks when the user's scilab code does some specific thing, if your user's code never triggers the fork, you can still run the condor_compile'd scilab. For example, you can condor_compile code that calls fork, but it will return the error ENOSYS if it is actually called.

Finally, I believe that scilab is a matlab clone? There is another open source matlab clone called octave. I have successful condor-compile'd octave, and run it in the Condor standard universe. It may be possible to translate your scilab code to octave, and run it that way. Octave allows the user's code to call fork and other system calls the standard universe prohibits, but as long as the Octave code never calls these, it is ok to run in the standard universe. It is often the case that long-running batch jobs don't need these calls.

-greg