[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Limiting jobs to certain time



Hi folks,

We have a recurrent issue where users submit jobs that will run for over a month, and that can't be checkpointed in the standard universe, but could be by using custom checkpointing, or functionality for saving "workspace" in packages such as R or Matlab.

The present problem with these long running jobs is that the chance of completion decreases nearly exponentially in time with fair share condor scheduling, and with simple system instabilities. As such these jobs take often 10 times more CPU time than necessary, and tend to clog up the queues. Often users abandon them because they don't complete.

To help most users reconsider this approach, we'd like to limit the total time used for any given vanilla job to a total of something like 24 hours.

How would we code this in condor's configuration language?

thanks,
rob