[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] sched daemon keeps dying



The following is the output I keep seeing from SchedLog on my Submission node. The Schedd daemon dies repeatedly. I'm wondering if the MPI job (pid 582) is responsible for crashing it.
Any ideas?


7/21 16:41:30 ******************************************************
7/21 16:41:30 ** condor_schedd (CONDOR_SCHEDD) STARTING UP
7/21 16:41:30 ** $CondorVersion: 6.7.0 Apr 27 2004 $
7/21 16:41:30 ** $CondorPlatform: I386-LINUX-RH9 $
7/21 16:41:30 ** PID = 575
7/21 16:41:30 ******************************************************
7/21 16:41:30 Using config file: /usr/local/condor/condor_config
7/21 16:41:30 Using local config files: /cluster/condor/etc/condor_config.BEERLI /cluster/condor/etc/petal017.local
7/21 16:41:30 DaemonCore: Command Socket at <144.174.160.147:49580>
7/21 16:41:31 Sent ad to central manager for loughlin@xxxxxxxxxxxx
7/21 16:41:31 Sent ad to central manager for lakner@xxxxxxxxxxxx
7/21 16:41:31 Sent ad to central manager for buckley@xxxxxxxxxxxx
7/21 16:41:42 DaemonCore: Command received via TCP from host <144.174.132.213:38017>
7/21 16:41:42 DaemonCore: received command 416 (NEGOTIATE), calling handler (negotiate)
7/21 16:41:42 Negotiating for owner: buckley@xxxxxxxxxxxx
7/21 16:41:42 Checking consistency running and runnable jobs
7/21 16:41:42 Tables are consistent
7/21 16:41:44 Out of servers - 1 jobs matched, 1 jobs idle, 1 jobs rejected
7/21 16:41:44 Increasing flock level for buckley@xxxxxxxxxxxx to 1.
7/21 16:41:44 Activity on stashed negotiator socket
7/21 16:41:44 Negotiating for owner: DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxx
7/21 16:41:47 Out of servers - 0 reqs matched, 4 reqs idle, 4 reqs rejected
7/21 16:41:47 Activity on stashed negotiator socket
7/21 16:41:47 Negotiating for owner: lakner@xxxxxxxxxxxx
7/21 16:41:47 Checking consistency running and runnable jobs
7/21 16:41:47 Tables are consistent
7/21 16:42:22 Out of servers - 0 jobs matched, 34 jobs idle, 34 jobs rejected
7/21 16:42:22 Increasing flock level for lakner@xxxxxxxxxxxx to 1.
7/21 16:42:22 Sent ad to central manager for loughlin@xxxxxxxxxxxx
7/21 16:42:22 Sent ad to central manager for lakner@xxxxxxxxxxxx
7/21 16:42:22 Sent ad to central manager for buckley@xxxxxxxxxxxx
7/21 16:42:23 Activity on stashed negotiator socket
7/21 16:42:23 Negotiating for owner: buckley@xxxxxxxxxxxx
7/21 16:42:23 Checking consistency running and runnable jobs
7/21 16:42:23 Tables are consistent
7/21 16:42:24 Out of jobs - 1 jobs matched, 0 jobs idle, flock level = 0
7/21 16:42:24 Started shadow for job 9243.0 on "<144.174.160.22:32771>", (shadow pid = 580)
7/21 16:42:24 Activity on stashed negotiator socket
7/21 16:42:24 Negotiating for owner: DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxx
7/21 16:42:28 Out of requests - 4 reqs matched, 0 reqs idle
7/21 16:42:29 Sent ad to central manager for loughlin@xxxxxxxxxxxx
7/21 16:42:29 Sent ad to central manager for lakner@xxxxxxxxxxxx
7/21 16:42:29 Sent ad to central manager for buckley@xxxxxxxxxxxx
7/21 16:42:29 Activity on stashed negotiator socket
7/21 16:42:29 Negotiating for owner: lakner@xxxxxxxxxxxx
7/21 16:42:29 Checking consistency running and runnable jobs
7/21 16:42:29 Tables are consistent
7/21 16:43:08 Out of servers - 3 jobs matched, 31 jobs idle, 31 jobs rejected
7/21 16:43:08 Started shadow for MPI job 11934.0 (shadow pid = 582)
7/21 16:43:08 Started shadow for job 10285.0 on "<144.174.160.35:32771>", (shadow pid = 583)
7/21 16:43:08 Sent ad to central manager for loughlin@xxxxxxxxxxxx
7/21 16:43:08 Sent ad to central manager for lakner@xxxxxxxxxxxx
7/21 16:43:08 Sent ad to central manager for buckley@xxxxxxxxxxxx
7/21 16:43:08 Shadow pid 582 exited with status 106
7/21 16:43:08 ERROR "shadow exited with incorrect usage!
" at line 1344 in file dedicated_scheduler.C