[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problem with wheckpointing



On Oct 17, 2005, at 9:56 AM, Nicolas GUIOT wrote:

I have a problem with my checkpoint server : Jobs don't run (well, they do, until they want to checkpoint) : I have this error message on the checkpoint server :


10/17 14:44:01 ****************************************************** 10/17 14:44:01 ** condor_ckpt_server (CONDOR_CKPT_SERVER) STARTING UP 10/17 14:44:01 ** $CondorVersion: 6.7.10 Aug 3 2005 $ 10/17 14:44:01 ** $CondorPlatform: I386-LINUX_RH9 $ 10/17 14:44:01 ** PID = 1561 10/17 14:44:01 ****************************************************** 10/17 14:44:01 CKPT_SERVER running in directory /checkpoint 10/17 14:44:08 Receiving store request from XXX.XXX.XXX.24 Using descriptor 7 to handle request 10/17 14:44:08 ERROR: cannot make directory './XXX.XXX.XXX.24'

I started condor with the "root" user, and submitted the jobs from the XXX.XXX.XXX.24

What did I miss ?

Try making /checkpoint owned and writable by user condor.

+----------------------------------+---------------------------------+
|            Jaime Frey            |  Public Split on Whether        |
|        jfrey@xxxxxxxxxxx         |  Bush Is a Divider              |
|  http://www.cs.wisc.edu/~jfrey/  |         -- CNN Scrolling Banner |
+----------------------------------+---------------------------------+