[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problems with checkpointing.



Hi Alan,

> I would think that these would be identical RPMs, since we don't distribute 
> different binaries for RedHat 9, Fedora Core 1, or Fedora Core 3: We build 
> it on RedHat 9 and it just works on the Fedora Core 1-3. I know that the 
> download web page lists them separately--this is to make it clear what to 
> download. But they are identical.

OK, I was feeling "superstitious" ;-)

> I'm also a bit confused--you're installing the checkpoint server on all the 
> execution computers?

Yes, I inherited the spec file and process, so...  (P.S. we're installing 
the same RPM on all nodes, using same condor_config, using different 
condor_config.local)

> Can you be more specific about the errors you are getting?

OK, I was waiting for more details from users... I'll attach a bunch of 
stuff below, trying to show lifecycle of jobs, but here's a typical log 
entry when a job dies...  I know this job was condor_compiled on a RH9 
box, I don't know where it initially ran, but here it dies on a RH9 box:

001 (12450.852.000) 04/27 17:08:09 Job executing on host: <129.89.200.78:51017>
...
005 (12450.852.000) 04/27 17:08:14 Job terminated.
        (0) Abnormal termination (signal 11)
        (0) No core file
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 01:30:00, Sys 0 00:00:32  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        304  -  Run Bytes Sent By Job
        58917520  -  Run Bytes Received By Job
        0  -  Total Bytes Sent By Job
        0  -  Total Bytes Received By Job
...

> Yeah--these are the same binaries. Sorry for the confusion. :(

No worries, I still would have probably become superstitious ;-)

> I think we need to see some log files to better help you.

Actually, what's the preferred method of overwhelming you with logs?  
Shall I throw them up so as to be http-able?  Or would you prefer email?

Cheers,
Paul