[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] User log checkpoint messages.

On Feb 11, 2008, at 11:09 AM, P. A. Cheeseman wrote:

    I'm concerned with how to determine whether or not a checkpoint
message, whether it be the one prefixed with the 003 code or one which
is embedded in eviction information, indicates conclusively whether or
not a job remains executing.

    My first, I believe erroneous, impression was that a job ceased
execution upon checkpoint but I later realized that periodic checkpoints
or application initiated checkpoints would leave the job in execution.

It's occurred to me that the 003 checkpoint message may always leave a job executing while the checkpoint message embedded in an eviction may
indicate that a job has left execution.

    Is it that simple?

The job evicted log event itself means that the job has stopped executing, but didn't complete. The checkpoint message in the evicted event states whether the job was checkpointed immediately before it was killed. The job checkpointed event means that a periodic checkpoint occurred, and the job continued to run.

|           Jaime Frey           | I used to be a heavy gambler.     |
|       jfrey@xxxxxxxxxxx        | But now I just make mental bets.  |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind.        |