[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Checkpointing failed on X86_64



You are right.

As suggested by others, I am able to checkpoint and stop a job with 
condor_vacate_job jobid; condor_hold jobid
and resume the job with
condor_release jobid

Junjun

On Wednesday 22 November 2006 14:42, Todd Tannenbaum wrote:
> Previously Junjun Mao  wrote:
> > I compiled this simple program with condor_compile gcc -o count
>
> count.c
>
> <snip>
>
> > When I used condor_hold while the program was running I got
>
> this error
>
> > in the log file:
> >
> > 001 (008.000.000) 11/17 19:13:25 Job executing on host:
> > <10.10.20.90:42208>
> > ...
> > 004 (008.000.000) 11/17 19:15:20 Job was evicted.
> >         (0) Job was not checkpointed.
> >                 Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote
>
> Usage
>
> >                 Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local
>
> Usage
>
> >         570  -  Run Bytes Sent By Job
> >         4754958  -  Run Bytes Received By Job
> >
> > I looked for the manual
>
> http://www.cs.wisc.edu/condor/manual/v6.8/1_5Availability.html#se
>
> :Availability
> :
> > It appears condor_compile is not supported on my platform
>
> Fedora Core
>
> > 4/Opteron. Is this the real reason?
>
> I doubt it if you are running  Condor v6.8.2, since that version
> added 64bit Linux checkpoint support.
>
>  I don't recall if condor_hold will force a checkpoint or not.
> So I would retry your test using "condor_vacate" (or
> condor_vacate_job) to checkpoint and leave the machine, or
> "condor_checkpoint" (or condor_checkpoint_job) to checkpoint and
> keep running.
>
> Another thought : maybe the above happened because the job only
> ran for less than 2 minutes. Condor will (purposefully) not
> bother to checkpoint upon pre-emption unless more than X seconds
> of forward progress was made.  I don't recall off the top of my
> head what X is, sorry, but it was short.  3 minutes perhaps?
>
> Regards,
> Todd

-- 
To unsubscribe the mailing list, please send me an email

--
Dr. Junjun Mao, Research Associate
Steinman Hall, #1M-11
Levich Institute at City College of CUNY
140th Street & Convent Avenue
New York, NY 10031
(212) 650-6845 (Phone) 
(212) 650-6835 (fax)