Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] sched daemon keeps dying

Date: Wed, 21 Jul 2004 16:49:19 -0400
From: Daniel Loughlin <loughlin@xxxxxxxxxxxx>
Subject: [Condor-users] sched daemon keeps dying

The following is the output I keep seeing from SchedLog on my Submission node. The Schedd daemon dies repeatedly. I'm wondering if the MPI job (pid 582) is responsible for crashing it. Any ideas?

7/21 16:41:30 ****************************************************** 7/21 16:41:30 ** condor_schedd (CONDOR_SCHEDD) STARTING UP 7/21 16:41:30 ** $CondorVersion: 6.7.0 Apr 27 2004 $ 7/21 16:41:30 ** $CondorPlatform: I386-LINUX-RH9 $ 7/21 16:41:30 ** PID = 575 7/21 16:41:30 ****************************************************** 7/21 16:41:30 Using config file: /usr/local/condor/condor_config 7/21 16:41:30 Using local config files: /cluster/condor/etc/condor_config.BEERLI /cluster/condor/etc/petal017.local 7/21 16:41:30 DaemonCore: Command Socket at <144.174.160.147:49580> 7/21 16:41:31 Sent ad to central manager for loughlin@xxxxxxxxxxxx 7/21 16:41:31 Sent ad to central manager for lakner@xxxxxxxxxxxx 7/21 16:41:31 Sent ad to central manager for buckley@xxxxxxxxxxxx 7/21 16:41:42 DaemonCore: Command received via TCP from host <144.174.132.213:38017> 7/21 16:41:42 DaemonCore: received command 416 (NEGOTIATE), calling handler (negotiate) 7/21 16:41:42 Negotiating for owner: buckley@xxxxxxxxxxxx 7/21 16:41:42 Checking consistency running and runnable jobs 7/21 16:41:42 Tables are consistent 7/21 16:41:44 Out of servers - 1 jobs matched, 1 jobs idle, 1 jobs rejected 7/21 16:41:44 Increasing flock level for buckley@xxxxxxxxxxxx to 1. 7/21 16:41:44 Activity on stashed negotiator socket 7/21 16:41:44 Negotiating for owner: DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxx 7/21 16:41:47 Out of servers - 0 reqs matched, 4 reqs idle, 4 reqs rejected 7/21 16:41:47 Activity on stashed negotiator socket 7/21 16:41:47 Negotiating for owner: lakner@xxxxxxxxxxxx 7/21 16:41:47 Checking consistency running and runnable jobs 7/21 16:41:47 Tables are consistent 7/21 16:42:22 Out of servers - 0 jobs matched, 34 jobs idle, 34 jobs rejected 7/21 16:42:22 Increasing flock level for lakner@xxxxxxxxxxxx to 1. 7/21 16:42:22 Sent ad to central manager for loughlin@xxxxxxxxxxxx 7/21 16:42:22 Sent ad to central manager for lakner@xxxxxxxxxxxx 7/21 16:42:22 Sent ad to central manager for buckley@xxxxxxxxxxxx 7/21 16:42:23 Activity on stashed negotiator socket 7/21 16:42:23 Negotiating for owner: buckley@xxxxxxxxxxxx 7/21 16:42:23 Checking consistency running and runnable jobs 7/21 16:42:23 Tables are consistent 7/21 16:42:24 Out of jobs - 1 jobs matched, 0 jobs idle, flock level = 0 7/21 16:42:24 Started shadow for job 9243.0 on "<144.174.160.22:32771>", (shadow pid = 580) 7/21 16:42:24 Activity on stashed negotiator socket 7/21 16:42:24 Negotiating for owner: DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxx 7/21 16:42:28 Out of requests - 4 reqs matched, 0 reqs idle 7/21 16:42:29 Sent ad to central manager for loughlin@xxxxxxxxxxxx 7/21 16:42:29 Sent ad to central manager for lakner@xxxxxxxxxxxx 7/21 16:42:29 Sent ad to central manager for buckley@xxxxxxxxxxxx 7/21 16:42:29 Activity on stashed negotiator socket 7/21 16:42:29 Negotiating for owner: lakner@xxxxxxxxxxxx 7/21 16:42:29 Checking consistency running and runnable jobs 7/21 16:42:29 Tables are consistent 7/21 16:43:08 Out of servers - 3 jobs matched, 31 jobs idle, 31 jobs rejected 7/21 16:43:08 Started shadow for MPI job 11934.0 (shadow pid = 582) 7/21 16:43:08 Started shadow for job 10285.0 on "<144.174.160.35:32771>", (shadow pid = 583) 7/21 16:43:08 Sent ad to central manager for loughlin@xxxxxxxxxxxx 7/21 16:43:08 Sent ad to central manager for lakner@xxxxxxxxxxxx 7/21 16:43:08 Sent ad to central manager for buckley@xxxxxxxxxxxx 7/21 16:43:08 Shadow pid 582 exited with status 106 7/21 16:43:08 ERROR "shadow exited with incorrect usage! " at line 1344 in file dedicated_scheduler.C

Follow-Ups:
- Re: [Condor-users] sched daemon keeps dying
  - From: Erik Paulson

Prev by Date: [Condor-users] SECMAN:2003:TCP connection failed
Next by Date: [Condor-users] how to terminate jobs automatically
Previous by thread: Re: [Condor-users] SECMAN:2003:TCP connection failed
Next by thread: Re: [Condor-users] sched daemon keeps dying
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] sched daemon keeps dying