[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Resending: Solaris 10 - All jobs idling for ever...





On 9/20/05, Prashant Lal <lalp@xxxxxxxxxxx> wrote:
Hi


Step 1) do append this at the end of your global config.
Append the below given lines in your condor_config (global file) at the end of the file, do a reconfig, if possible condor_restart -all and then check.

START : Owner == "whateverusername" || Owner == "condor" || Owner == "whateverusername" || Owner == "whateverusername" || Owner == "whateverusername"
START : True
SUSPEND : False
CONTINUE : True
PREEMPT : False
KILL : False

Done!

Step2) What your LOCAL_DIR points to (it is for each hosts and it holds the Logs and Execute Dir)

LOCAL_DIR               = $(RELEASE_DIR)/hosts/$(HOSTNAME)
 
Does it have all the require directories.

A typical example:

bgoncal@lab1a> ls
condor_config.local  log
execute              spool
bgoncal@lab1a>   

Everything seems to be there...

Step3) Please let us know what are the other changes you have made in the global config file.

None, other than the ones you requested. I'm using all default values generated by condor_install

On Mon, 2005-09-19 at 15:34 -0700, Michael Yoder wrote:
> 	What do the ShadowLog (on the local machine)
> 
> I'm getting this:
> 
> bgoncal@lab1a> condor_config_val SHADOW_LOG
> /home/condor/hosts/lab1a/log/ShadowLog
> bgoncal@lab1a> more /home/condor/hosts/lab1a/log/ShadowLog
> /home/condor/hosts/lab1a/log/ShadowLog: No such file or directory
> bgoncal@lab1a>

Oh!  That changes things.  (More below)

> 9/17 06:55:46 DaemonCore: received command 440 (MATCH_INFO), calling
> handler (command_match_info)
> 9/17 06:55:46 vm1: match_info called
> 9/17 06:55:46 vm1: Received match
<170.140.151.126:38590>#1126713130#988
> 9/17 06:55:46 vm1: State change: match notification protocol
successful
> 9/17 06:55:46 vm1: Changing state: Unclaimed -> Matched
> 9/17 06:55:47 DaemonCore: Command received via UDP from host
> <170.140.151.110:35503>
> 9/17 06:55:47 DaemonCore: received command 440 (MATCH_INFO), calling
> handler (command_match_info)
> 9/17 06:55:47 vm2: match_info called
> 9/17 06:55:47 vm2: Received match
<170.140.151.126:38590>#1126713130#989
> 9/17 06:55:47 vm2: State change: match notification protocol
successful
> 9/17 06:55:47 vm2: Changing state: Unclaimed -> Matched
> 9/17 06:55:52 DaemonCore: Command received via UDP from host
> <170.140.151.110:35618>
> 9/17 06:55:52 DaemonCore: received command 443 (RELEASE_CLAIM),
calling
> handler (command_release_claim)
> 9/17 06:55:52 vm1: State change: received RELEASE_CLAIM command
> 9/17 06:55:52 vm1: Changing state: Matched -> Owner
> 9/17 06:55:52 vm1: State change: IS_OWNER is false
> 9/17 06:55:52 vm1: Changing state: Owner -> Unclaimed

This means that the schedd is being matched with the startd, but for
some reason the startd is getting a command (RELEASE_CLAIM) and letting
go of that match.  This is happening before the job even thinks of
starting.

Can you give us more from your SchedLog?  (condor_config_val SCHEDD_LOG)
(On the submit machine.)

Can you provide us with an IP -> machine mapping for the above?  What
roles do 170.140.151.126 and 
170.140.151.110 play?

> All the daemons mentioned in ... are running:
> 
> I'm probably missing something obvious, but I have no idea what it
might
> be... :(

It's not obvious yet. :-)

What interesting values have you modified in your condor_config file(s),
if any?

Mike Yoder
Principal Member of Technical Staff
Ask Mike: http://docs.optena.com
Direct  : +1.408.321.9000
Fax     : +1.408.321.9030
Mobile  : +1.408.497.7597
yoderm@xxxxxxxxxx

Optena Corporation
2860 Zanker Road, Suite 201
San Jose, CA 95134
http://www.optena.com

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

--
Prashant Lal <lalp@xxxxxxxxxxx>
Cadence Design Systems

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users




--
*******************************************
Bruno Miguel Tavares Goncalves, MS
PhD Candidate
Emory University
Department of Physics
Office No. N117-C
400 Dowman Drive
Atlanta, Georgia 30322
Homepage: www.bgoncalves.com
Email: bgoncalves@xxxxxxxxx
Phone: (404) 712-2441
Fax:   (404) 727-0873
*******************************************