Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] job running on two hosts?

Date: Wed, 17 Nov 2004 09:20:17 -0600
From: Erik Paulson <epaulson@xxxxxxxxxxx>
Subject: Re: [Condor-users] job running on two hosts?

On Wed, Nov 17, 2004 at 09:49:10AM -0500, Dan Christensen wrote:
> Dan Christensen <jdc@xxxxxx> writes:
> 
> >
> > And when it ran the second time, it seemed to start at the beginning,
> > because it tried to open its output file, and it noticed that it
> > already existed and quit right away.
> 

Files are opened on the submit machine as the job runs, and are
not stored in the checkpoint. (File pointers are, so we know where we
left of). If there is no checkpoint, we will start from the beginning. 
If you job looks at what it might have previously written, it is in
violation of one of the restrictions on standard universe jobs:
http://www.cs.wisc.edu/condor/manual/v6.6.7/2_4Road_map_Running.html#SECTION00341100000000000000

> Here's another clue I just found:  I got an e-mail from Condor saying
> that condor_schedd died on 129.100.75.77 due to a SEGV.  I guess that
> would explain the missing information in the user log file.
> 
> > Date: Tue, 16 Nov 2004 02:11:34 -0500
> > 
> > "/usr/sbin/condor_schedd" on "jdc.math.uwo.ca" died due to signal 11.
> > Condor will automatically restart this process in 10 seconds.
> 
> But now the question is, why did it die?
> 

We'd need to see the complete logfile of the schedd. Please send it to
condor-admin@xxxxxxxxxxx, and we'll try and debug it off-line.

-Erik

Follow-Ups:
- Re: [Condor-users] job running on two hosts?
  - From: Dan Christensen

References:
- [Condor-users] Unable to re-submit dag rescue file
  - From: Michael Remijan
- Re: [Condor-users] Unable to re-submit dag rescue file
  - From: Peter F. Couvares
- [Condor-users] job running on two hosts?
  - From: Dan Christensen
- Re: [Condor-users] job running on two hosts?
  - From: Dan Christensen

Prev by Date: Re: [Condor-users] job running on two hosts?
Next by Date: Re: [Condor-users] job running on two hosts?
Previous by thread: Re: [Condor-users] job running on two hosts?
Next by thread: Re: [Condor-users] job running on two hosts?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] job running on two hosts?