[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Cannot start startd on execute machine



thanks for the help once again.

i did exactly as you have instructed.

i added the SCHEDD to the DAEMONS_LIST on the execute machines and i added the STARTD to the DAEMONS_LIST on the manager machine.

this is the only change i made.

now when i try to  submit jobs from other execute machines i get this error

greenwich-nh:~/condor/brainVentricles> condor_submit sp01.cmd
Submitting job(s)
ERROR: Failed to connect to local queue manager
AUTHENTICATE:1003:Failed to authenticate with any method
AUTHENTICATE:1004:Failed to authenticate using GSI
GSI:5003:Failed to authenticate.  Globus is reporting error (851968:45).  There is probably a problem with your credentials.  (Did you run grid-proxy-init?)
AUTHENTICATE:1004:Failed to authenticate using KERBEROS
AUTHENTICATE:1004:Failed to authenticate using FS


i am getting this error on all machines except the ! manager itself.

Also i need to know whether i need the startd on the manager or not?

when i run the startd on the manager , the manager is also included in the list of machines available for job execution , which is some thing i dont want.

so is the startd supposed to run on manager or not. the manager also happens to be the submit machine.



Peter Troeger <peter.troeger@xxxxxxxxxxxxxxxxxx> wrote:
Your config file parameter "DAEMON_LIST" is missing the STARTD parameter:

--- snip

######################################################################
## Daemon-specific settings:
######################################################################

##--------------------------------------------------------------------
## condor_master
##--------------------------------------------------------------------
## Daemons you want the master to keep running for you:
DAEMON_LIST = MASTER, SCHEDD, COLLECTOR, NEGOTIATOR

--- snip

Should look like this:

DAEMON_LIST = MASTER, SCHEDD, COLLECTOR, NEGOTIATOR, STARTD


If you still rely on the DEB package, this is really strange. Usually,
when you check the "job execution" option in the DEBCONF procedure, the
STARTD entry should be added to the daemon list. Maybe you had an old
config file in place at install time ?!?

For the moment, please fix your config files accordingly, and restart
Condor on the machines.

Regards,
Peter.



Junaid N. Sahibzada schrieb:
> Hi Peter,
>
> I have a few questions to ask.
>
> the condor manual says that the following daemons should be running on
> the manager
>
> condor_master
> condor_collector
> condor_negotiator
> condor_startd
> condor_schedd
>
> on my manager which is also the submit machine for the pool the the
> startd is missing
>
> root@caudate-nh:~# ps -ef |grep condor
> condor 7649 1 0 Dec01 ? 00:00:22 condor_master
> condor 7650 7649 0 Dec01 ? 00:00:03 condor_collector -f
> condor 7651 7649 0 Dec01 ? 00:00:00 condor_schedd -f
> condor 7652 7649 0 Dec01 ? 00:00:01 condor_negotiator -f
> root 19852 7535 0 11:18 pts/3 00:00:00 grep condor
>
>
> as you can see the startd is missing.
>
> now the manual also says that on all other machines the following should
> be running
>
>
> condor_master
> condor_startd
> condor_schedd
>
> all other machines in my pool are execute machine
>
> i have the following on all ! of them
>
> root@greenwich-nh:/u/sah006# ps -ef |grep condor
> condor 31177 1 0 Dec01 ? 00:00:05 condor_master
> condor 31178 31177 0 Dec01 ? 00:00:23 condor_startd -f
> root 9793 30336 0 11:13 pts/3 00:00:00 grep condor
>
> root@pineal-nh:/u/sah006 # ps -ef |grep condor
> condor 25965 1 0 Dec01 ? 00:00:09 condor_master
> condor 25966 25965 0 Dec01 ? 00:00:24 condor_startd -f
> root 5099 24980 0 11:14 pts/19 00:00:00 grep condor
>
> root@medshare-nh:/etc/condor# ps -ef |grep condor
> condor 26381 1 0 Dec01 ? 00:00:09 condor_master
> condor 26382 26381 0 Dec01 ? 00:00:10 condor_startd -f
> root 20524 22388 0 11:14 pts/4 00:00:00 grep condor
>
>
> cerebellum-nh:~# ps -ef |grep condor
> condor 6952 1 0 Dec01 ? 00:00:13 condor_master
> condor 69! 53 6952 0 Dec01 ? 00:00:26 condor_startd -f
> root 18575 5860 0 11:21 pts/25 00:00:00 grep condor
>
>
> as you can see none of them are running the schedd . can u tell me whats
> wrong.
>
> i am attaching the config file for the manager machine .
>
> the config files for the rest of the execute machines are also exactly
> the same
>
>
>
>
>
>
>
>
> */Peter Troeger /* wrote:
>
> Google is your friend ;-) ...
>
> https://lists.cs.wisc.edu/archive/condor-users/pre-2004-June/msg01307.shtml
>
> Please check your config files, both
>
> '/usr/local/condor/etc/condor_config' and
> '/usr/local/condor/condor_config.local',
>
> for a misconfigured "START = " parameter. You can use "START = TRUE" for
> testing purp! oses. Don't forget to restart Condor after the configuration
> change.
>
> Regards,
> Peter.
>
> Junaid N. Sahibzada schrieb:
> > ok this time i have this error in the StartLog of the second execute
> > machine i am trying to setup
> >
> > 11/18 16:07:22 ** $CondorVersion: 6.6.10 Jun 13 2005 $
> > 11/18 16:07:22 ** $CondorPlatform: I386-LINUX_RH9 $
> > 11/18 16:07:22 ** PID = 5985
> > 11/18 16:07:22 ******************************************************
> > 11/18 16:07:22 Using config file: /usr/local/condor/etc/condor_config
> > 11/18 16:07:22 Using local config files:
> > /usr/local/condor//condor_config.local
> > 11/18 16:07:22 DaemonCore: Command Socket at <130.155.26.225:45277>
> > 11/18 16:07:29 ERROR "Required attribute "START" is not defined"
> at line
> > 255 in file util.C
> > 11/18 16:07:46 ******************************************************
> > 11/18 16:07:46 ** condor_startd (CONDOR_STARTD) STARTING UP
> > 11/18 16:07:46 ** /usr/local/condor/sbin/condor_startd
> > 11/18 16:07:46 ** $CondorVersion: 6.6.10 Jun 13 2005 $
> > 11/18 16:07:46 ** $CondorPlatform: I386-LINUX! _RH9 $
> > 11/18 16:07:46 ** PID = 6020
> > 11/18 16:07:46 ******************************************************
> > 11/18 16:07:46 Using config file: /usr/local/condor/etc/condor_config
> > 11/18 16:07:46 Using local config files:
> > /usr/local/condor//condor_config.local
> > 11/18 16:07:46 DaemonCore: Command Socket at <130.155.26.225:45278>
> > 11/18 16:07:52 ERROR "Required attribute "START" is not defined"
> at line
> > 255 in file util.C
&g! t; > 11/18 16:08:17 ******************************************************
> > 11/18 16:08:17 ** condor_startd (CONDOR_STARTD) STARTING UP
> > 11/18 16:08:17 ** /usr/local/condor/sbin/condor_startd
> > 11/18 16:08:17 ** $CondorVersion: 6.6.10 Jun 13 2005 $
> > 11/18 16:08:17 ** $CondorPlatform: I386-LINUX_RH9 $
> > 11/18 16:08:17 ** PID = 6039
> > 11/18 16:08:17 ******************************************************
> > 11/18 16:08:17 Using config file: /usr/local/condor/etc/condor_config
> > 11/18 16:08:17 Using local config files:
> > /usr/local/condor//condor_config.local
> > 11/18 16:08:17 DaemonCore: Command Socket at <130.155.26.225:45279>
> > 11/18 16:08:23 ERROR "Required attribute "START" is not defined"
> at line
> > 255 in file util.C
> >
> >
> >
> ! > *Junaid N. Sahibzada*
> > *Cell # (+61) 404 998 494 *
> > *284/9 Crystal St. Waterloo, 2017, NSW, Australia*
> > *International Student MSc Internetworking, UTS, Australia*
> > *Bachelor of Information Technology, NUST, Pakistan*
> >
> > Yahoo! FareChase - Search multiple travel sites in one click.
> >
> >
> >
> >
> >
> ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>
>
>
> *Junaid N. Sahibzada*
> *Cell # (+61) 404 998 494 *
> *284/9 Crystal St. Waterloo, 2017, NSW, Australia*
> *International Student MSc Internetworking, UTS, Australia*
> *Bachelor of Information Technology, NUST, Pakistan*
> Yahoo! DSL
>
> Something to write home about. Just $16.99/mo. or less
>
> *Junaid N. Sahibzada*
> *Cell # (+61) 404 998 494 *
> *284/9 Crystal St. Waterloo, 2017, NSW, Australia*
> *International Student MSc Internetworking, UTS, Australia*
> *Bachelor of Information Technology, NUST, Pakistan*
> *Yahoo! Personals*
> Single? There's someone we'd like you to meet.
> Lots of someones, actually. Yahoo! Personals
>
>
> *Junaid N. Sahibzada*
> *Cell # (+61) 404 998 494 *
> *284/9 Crystal St. Waterloo, 2017, NSW, Australia*
> *International Student MSc Internetworking, UTS, Australia*
> *Bachelor of Information Technology, NUST, Pakistan*
>
> *Yahoo! Personals*
> Single? There's someone we'd like you to meet.
> Lots of someones, actually. Try Yahoo! Personals
>




Junaid N. Sahibzada
Cell # (+61) 404 998 494 
284/9 Crystal St. Waterloo, 2017, NSW, Australia
International Student MSc Internetworking, UTS, Australia
Bachelor of Information Technology, NUST, Pakistan


Yahoo! DSL Something to write home about. Just $16.99/mo. or less