[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job was not checkpointed



Hi,

I think we're having the same problem, where jobs are not checkpointing
when being evicted. We are working on a cluster of Solaris 9 stations.

Also, the jobs consistently get evicted every three hours.

Here is a sample from a log file. The error and output files are empty.

***************************************************************************
004 (106.000.000) 10/17 09:53:17 Job was evicted.
        (0) Job was not checkpointed.
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
        1322  -  Run Bytes Sent By Job
        133228  -  Run Bytes Received By Job
...
001 (106.000.000) 10/17 09:55:09 Job executing on host:
<130.127.206.32:32788>
...
004 (106.000.000) 10/17 12:55:15 Job was evicted.
        (0) Job was not checkpointed.
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
        1322  -  Run Bytes Sent By Job
        133228  -  Run Bytes Received By Job
...
001 (106.000.000) 10/17 12:57:40 Job executing on host:
<130.127.206.42:32781>
...

***************************************************************************
Thank you for any help you can provide.

Brian Dandurand


> Hi,
>
> My jobs do not get migrated to then next available node, its gives me
> the below error
>
> ___________________________________________________________________________________
> 011 (191.000.000) 07/03 11:06:07 Job was unsuspended.
> ...
> 004 (191.000.000) 07/03 11:06:07 Job was evicted.
>         (0) Job was not checkpointed.
>                 Usr 0 00:00:11, Sys 0 00:00:00  -  Run Remote Usage
>                 Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
>         0  -  Run Bytes Sent By Job
>         0  -  Run Bytes Received By Job
>
> ___________________________________________________________________________________
>
> Plz help :(
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>


----------------------------------------
Brian C. Dandurand
Clemson University
Department of Mathematical Sciences
Ph.D. Student
Office: Martin Hall E-6
Office Phone: (864)656-4749
----------------------------------------