[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Resending: Solaris 10 - All jobs idling for ever...





On 9/19/05, Michael Yoder <yoderm@xxxxxxxxxx> wrote:

> I'm trying to set up a pool in Solaris 10 (using the Solaris 9
> distribution since there doesn't seem to be a version 10 distro yet),
but
> I'm running in to a few problems... All the jobs I submit remain idle
for
> ever... I tried with quick and dirty unix commands like "sleep 10" and
> "date" just to try it out but with no luck. What I'm seeing right now
is
> this:

What do the ShadowLog (on the local machine)

I'm getting this:

bgoncal@lab1a> condor_config_val SHADOW_LOG
/home/condor/hosts/lab1a/log/ShadowLog
bgoncal@lab1a> more /home/condor/hosts/lab1a/log/ShadowLog
/home/condor/hosts/lab1a/log/ShadowLog: No such file or directory
bgoncal@lab1a>

and StarterLog (on the
remote machine) say?  Please have a look at

The StarterLog doesn't exhist either... and the StartLog has a whole lot of:

9/17 06:55:46 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/17 06:55:46 vm1: match_info called
9/17 06:55:46 vm1: Received match <170.140.151.126:38590>#1126713130#988
9/17 06:55:46 vm1: State change: match notification protocol successful
9/17 06:55:46 vm1: Changing state: Unclaimed -> Matched
9/17 06:55:47 DaemonCore: Command received via UDP from host <170.140.151.110:35
503>
9/17 06:55:47 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/17 06:55:47 vm2: match_info called
9/17 06:55:47 vm2: Received match <170.140.151.126:38590>#1126713130#989
9/17 06:55:47 vm2: State change: match notification protocol successful
9/17 06:55:47 vm2: Changing state: Unclaimed -> Matched
9/17 06:55:52 DaemonCore: Command received via UDP from host <170.140.151.110:35
618>
9/17 06:55:52 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/17 06:55:52 vm1: State change: received RELEASE_CLAIM command
9/17 06:55:52 vm1: Changing state: Matched -> Owner
9/17 06:55:52 vm1: State change: IS_OWNER is false
9/17 06:55:52 vm1: Changing state: Owner -> Unclaimed
9/17 06:55:52 DaemonCore: Command received via UDP from host <170.140.151.110:35
627>
9/17 06:55:52 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/17 06:55:52 vm2: State change: received RELEASE_CLAIM command
9/17 06:55:52 vm2: Changing state: Matched -> Owner
9/17 06:55:52 vm2: State change: IS_OWNER is false
9/17 06:55:52 vm2: Changing state: Owner -> Unclaimed
9/17 06:59:46 Failed to open /proc/interrupts
9/17 06:59:46 get_mouse_info(): Failed to open /proc/interrupts
9/17 06:59:46 Failed to obtain keyboard or mouse idle information.
9/17 06:59:46 Assuming the keyboard and mouse to be infinitely idle.
9/17 07:00:54 DaemonCore: Command received via UDP from host <170.140.151.110:35
761>

All the daemons mentioned in http://www.cs.wisc.edu/condor/manual/v6.7/3_2Installation.html#SECTION00427000000000000000 are running:

For the submit machine:

bgoncal@lab1a> ps -ef | grep condor
 bgoncal  9696     1   0   Sep 14 ?           4:31 condor_master
 bgoncal 12608  9696   0   Sep 15 ?          10:34 condor_startd -f
 bgoncal 18264  9696   1   Sep 18 ?           8:25 condor_schedd -f
 bgoncal 18265  9696  10   Sep 18 ?          41:53 condor_negotiator -f
 bgoncal 22105 21868   0 17:52:17 pts/3       0:00 grep condor
 bgoncal 18263  9696   1   Sep 18 ?          21:00 condor_collector -f
bgoncal@lab1a>

and for one of the compute machines:

bgoncal@lab3c > ps -ef | grep condor
 bgoncal 11208     1   0   Sep 14 ?           3:18 condor_master
 bgoncal 21941 21930   0 17:51:14 pts/2       0:00 grep condor
 bgoncal 11210 11208   0   Sep 14 ?           0:11 condor_schedd -f
 bgoncal 18147 11208   0   Sep 18 ?           4:32 condor_startd -f
bgoncal@lab3c> 

I'm probably missing something obvious, but I have no idea what it might be... :(
Any suggestions?
Thanks,

Bruno



http://docs.optena.com/display/CONDOR/Troubleshooting and
http://docs.optena.com/display/CONDOR/Shadow+Exception

Mike Yoder
Principal Member of Technical Staff
Ask Mike: http://docs.optena.com
Direct  : +1.408.321.9000
Fax     : +1.408.321.9030
Mobile  : +1.408.497.7597
yoderm@xxxxxxxxxx

Optena Corporation
2860 Zanker Road, Suite 201
San Jose, CA 95134
http://www.optena.com



_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users



--
*******************************************
Bruno Miguel Tavares Goncalves, MS
PhD Candidate
Emory University
Department of Physics
Office No. N117-C
400 Dowman Drive
Atlanta, Georgia 30322
Homepage: www.bgoncalves.com
Email: bgoncalves@xxxxxxxxx
Phone: (404) 712-2441
Fax:   (404) 727-0873
*******************************************