[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] problem with vacate/suspend



Sorry to flood the list since the last few days, but I'm entering in the "production" phase of condor in my lab, and of course, with the users come the problems...

So, the problem now is : for example we had a program running (through condor) for 3-4 days on a computer, and today we checked it : it wasn't running anymore on this cpu, but on another one, and had started again from scratch !! 
-->Why ? (It's a vanilla job) : what's the correct setting to avoid that (let it run, and have the other one niced, or something such)

Second, we have some type of calcul that we could run infinitly : the longer it lasts, the finer the results are : how would it it possible to say something like : "stop the calcul, then copy the results back to the original location".
--> right now, if we do condor_rm, it just erases all what was computed until now. We could first get back the execute/dir_XXXX files, but for that we need to first search on which node it runs (not very practical) : is there an easier way ?

Thanks in advance
Nicolas

-----------------------------------------------
CNRS - UPR 9080 : Laboratoire de Biochimie Theorique
Institut de Biologie Physico-Chimique
13 rue Pierre et Marie Curie
75005 PARIS - FRANCE

Tel : +33 158 41 51 70
Fax : +33 158 41 50 26
------------------------------------------------