[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problems starting jobs: errno = 104, assuming failure.



On Jun 20, 2006, at 9:59 PM, Andrew Mellanby wrote:

I've been setting up a Condor on Windows XP machines with a Red Hat server.

The matching seems to work ok, but the jobs are never started (see below)

Anyone got any ideas what is going on here ?

thanks

Andrew.
-------

Matchlog
6/21 11:20:42       Matched 3.1 mel@xxxxxxxxx <130.195.85.70:32821>
preempting none <130.195.109.37:1051>
6/21 11:20:42       Matched 3.2 mel@xxxxxxxxx <130.195.85.70:32821>
preempting none <130.195.7.232:1167>

SchedLog
6/21 11:15:41 Sent RELEASE_CLAIM to startd on <130.195.109.37:1051>
6/21 11:15:41 Match record (<130.195.109.37:1051>, 1, 1) deleted
6/21 11:15:41 condor_read(): recv() returned -1, errno = 104, assuming
failure.
6/21 11:15:41 IO: Failed to read packet header
6/21 11:15:41 Response problem from startd.
6/21 11:15:41 Sent RELEASE_CLAIM to startd on <130.195.7.232:1167>
6/21 11:15:41 Match record (<130.195.7.232:1167>, 1, 2) deleted
6/21 11:17:56 IO: Failed to read packet header
6/21 11:18:49 IO: Failed to read packet header
6/21 11:19:04 IO: Failed to read packet header

Have you tried looking in the startd log on the execute machine the job was matched with?

+--------------------------------+-----------------------------------+
|           Jaime Frey           | I used to be a heavy gambler.     |
|       jfrey@xxxxxxxxxxx        | But now I just make mental bets.  |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind.        |
+--------------------------------+-----------------------------------+