[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] idle jobs



Thanks your kindly reply 
 
I have set START=TRUE in the local configuration file 
and run the condor_reschedule command 
The problem still exists.  
 
to provide more information: 
I include the log file contents.  
 
Thanks,  
 
Lizhe  
 
 
StartLog: 
----------------- 
9/7 13:56:51 Connect failed for 10 seconds; returning FALSE 
9/7 13:56:51 ERROR: 
SECMAN:2004:Failed to start a session with TCP 
SECMAN:2003:TCP connection to <194.199.22.87:34884> failed 
 
9/7 13:56:51 condor_write(): Socket closed when trying to write buffer 
9/7 13:56:51 Buf::write(): condor_write() failed 
9/7 13:56:51 SECMAN: Error sending response classad! 
9/7 13:56:51 Our parent process (pid 17196) went away; shutting down 
9/7 13:56:51 Can't connect to <194.199.22.87:9618>:0, errno = 111 
9/7 13:56:51 Will keep trying for 10 seconds... 
9/7 13:57:01 Connect failed for 10 seconds; returning FALSE 
9/7 13:57:01 ERROR: 
SECMAN:2003:TCP connection to <194.199.22.87:9618> failed 
 
9/7 13:57:01 Error sending update to the collector HEAVEN.inrialpes.fr 
<194.199.22.87:9618>: Failed to send UDP update command to collector 
9/7 13:57:01 Error sending update to collector(s) 
9/7 13:57:01 Got SIGTERM. Performing graceful shutdown. 
9/7 13:57:01 shutdown graceful 
9/7 13:57:01 Deleting Cronmgr 
9/7 13:57:01 Can't connect to <194.199.22.87:9618>:0, errno = 111 
9/7 13:57:01 Will keep trying for 10 seconds... 
9/7 13:57:14 passwd_cache::cache_uid(): getpwnam("condor") failed: Success 
                                                                                                                                      9/7 
13:57:14 passwd_cache::cache_uid(): getpwnam("condor") failed: Success 
 
9/7 13:57:14 ****************************************************** 
9/7 13:57:14 ** condor_startd (CONDOR_STARTD) STARTING UP 
9/7 13:57:14 ** /home/lwang/condor/install/sbin/condor_startd 
9/7 13:57:14 ** $CondorVersion: 6.6.10 Jun 13 2005 $ 
9/7 13:57:14 ** $CondorPlatform: I386-LINUX_RH9 $ 
9/7 13:57:14 ** PID = 17728 
9/7 13:57:14 ******************************************************                                                                   
9/7 13:57:14 Using config file: /home/lwang/condor/install/etc/condor_config 
9/7 13:57:14 Using local config 
files: /home/lwang/condor/install/hosts/HEAVEN/condor_config.local 
9/7 13:57:14 DaemonCore: Command Socket at <194.199.22.87:35036> 
9/7 13:57:20 New machine resource allocated 
9/7 13:57:20 About to run initial benchmarks. 
9/7 13:57:26 Completed initial benchmarks. 
9/7 13:57:26 State change: IS_OWNER is false 
9/7 13:57:26 Changing state: Owner -> Unclaimed 
--------------------------------------------------------- 
 
ScheduleLog: 
 
------------------- 
9/7 13:57:14 passwd_cache::cache_uid(): getpwnam("condor") failed: Success 
 
9/7 13:57:14 passwd_cache::cache_uid(): getpwnam("condor") failed: Success 
 
9/7 13:57:14 ****************************************************** 
9/7 13:57:14 ** condor_schedd (CONDOR_SCHEDD) STARTING UP 
9/7 13:57:14 ** /home/lwang/condor/install/sbin/condor_schedd 
9/7 13:57:14 ** $CondorVersion: 6.6.10 Jun 13 2005 $ 
9/7 13:57:14 ** $CondorPlatform: I386-LINUX_RH9 $ 
9/7 13:57:14 ** PID = 17729 
9/7 13:57:14 ****************************************************** 
9/7 13:57:14 Using config file: /home/lwang/condor/install/etc/condor_config 
9/7 13:57:14 Using local config 
files: /home/lwang/condor/install/hosts/HEAVEN/condor_config.local 
9/7 13:57:14 DaemonCore: Command Socket at <194.199.22.87:35037> 
------------------------- 
egotiatorLog: 
---------------------- 
 
 
9/7 13:57:14 passwd_cache::cache_uid(): getpwnam("condor") failed: Success 
 
9/7 13:57:14 passwd_cache::cache_uid(): getpwnam("condor") failed: Success 
 
9/7 13:57:14 ****************************************************** 
9/7 13:57:14 ** condor_negotiator (CONDOR_NEGOTIATOR) STARTING UP 
9/7 13:57:14 ** /home/lwang/condor/install/sbin/condor_negotiator 
9/7 13:57:14 ** $CondorVersion: 6.6.10 Jun 13 2005 $ 
9/7 13:57:14 ** $CondorPlatform: I386-LINUX_RH9 $ 
9/7 13:57:14 ** PID = 17727 
9/7 13:57:14 ****************************************************** 
9/7 13:57:14 Using config file: /home/lwang/condor/install/etc/condor_config 
9/7 13:57:14 Using local config 
files: /home/lwang/condor/install/hosts/HEAVEN/condor_config.local 
9/7 13:57:14 DaemonCore: Command Socket at <194.199.22.87:9614> 
9/7 13:57:14 ACCOUNTANT_HOST = None (local) 
9/7 13:57:14 NEGOTIATOR_INTERVAL = 300 sec 
9/7 13:57:14 NEGOTIATOR_TIMEOUT = 30 sec 
9/7 13:57:14 PREEMPTION_REQUIREMENTS = (CurrentTime - EnteredCurrentState) > 
(1 * (60 * 60)) && RemoteUserPrio > SubmittorPrio * 1.2 
9/7 13:57:14 PREEMPTION_RANK = (RemoteUserPrio * 1000000) - TARGET.ImageSize 
9/7 13:57:14 ---------- Started Negotiation Cycle ---------- 
9/7 13:57:14 Phase 1:  Obtaining ads from collector ... 
9/7 13:57:14   Getting all public ads ... 
9/7 13:57:14   Sorting 0 ads ... 
9/7 13:57:14   Getting startd private ads ... 
9/7 13:57:14 Got ads: 0 public and 0 private 
9/7 13:57:14 Public ads include 0 submitter, 0 startd 
9/7 13:57:14 Phase 2:  Performing accounting ... 
9/7 13:57:14 Phase 3:  Sorting submitter ads by priority ... 
9/7 13:57:14 Phase 4.1:  Negotiating with schedds ... 
9/7 13:57:14 ---------- Finished Negotiation Cycle ---------- 
9/7 14:02:14 ---------- Started Negotiation Cycle ---------- 
9/7 14:02:14 Phase 1:  Obtaining ads from collector ... 
9/7 14:02:14   Getting all public ads ... 
9/7 14:02:14   Sorting 3 ads ... 
9/7 14:02:14   Getting startd private ads ... 
9/7 14:02:14 Got ads: 3 public and 1 private 
9/7 14:02:14 Public ads include 0 submitter, 1 startd 
9/7 14:02:14 Phase 2:  Performing accounting ... 
9/7 14:02:14 Phase 3:  Sorting submitter ads by priority ... 
9/7 14:02:14 Phase 4.1:  Negotiating with schedds ... 
9/7 14:02:14 ---------- Finished Negotiation Cycle ---------- 
------------------------------------ 
 
MasterLog 
------------------- 
9/7 13:57:14 passwd_cache::cache_uid(): getpwnam("condor") failed: Success 
 
9/7 13:57:14 passwd_cache::cache_uid(): getpwnam("condor") failed: Success 
 
9/7 13:57:14 ****************************************************** 
9/7 13:57:14 ** condor_master (CONDOR_MASTER) STARTING UP 
9/7 13:57:14 ** /home/lwang/condor/install/sbin/condor_master 
9/7 13:57:14 ** $CondorVersion: 6.6.10 Jun 13 2005 $ 
9/7 13:57:14 ** $CondorPlatform: I386-LINUX_RH9 $ 
9/7 13:57:14 ** PID = 17725 
9/7 13:57:14 ****************************************************** 
9/7 13:57:14 Using config file: /home/lwang/condor/install/etc/condor_config 
9/7 13:57:14 Using local config 
files: /home/lwang/condor/install/hosts/HEAVEN/condor_config.local 
9/7 13:57:14 DaemonCore: Command Socket at <194.199.22.87:35035> 
9/7 13:57:14 Started DaemonCore process 
"/home/lwang/condor/install/sbin/condor_collector", pid and pgroup = 17726 
9/7 13:57:14 Started DaemonCore process 
"/home/lwang/condor/install/sbin/condor_negotiator", pid and pgroup = 17727 
9/7 13:57:14 Started DaemonCore process 
"/home/lwang/condor/install/sbin/condor_startd", pid and pgroup = 17728 
9/7 13:57:14 Started DaemonCore process 
"/home/lwang/condor/install/sbin/condor_schedd", pid and pgroup = 17729 
------------------------------------------------------- 
 
CollectorLog: 
------------ 
9/7 13:57:14 passwd_cache::cache_uid(): getpwnam("condor") failed: Success 
 
9/7 13:57:14 passwd_cache::cache_uid(): getpwnam("condor") failed: Success 
 
9/7 13:57:14 ****************************************************** 
9/7 13:57:14 ** condor_collector (CONDOR_COLLECTOR) STARTING UP 
9/7 13:57:14 ** /home/lwang/condor/install/sbin/condor_collector 
9/7 13:57:14 ** $CondorVersion: 6.6.10 Jun 13 2005 $ 
9/7 13:57:14 ** $CondorPlatform: I386-LINUX_RH9 $ 
9/7 13:57:14 ** PID = 17726 
9/7 13:57:14 ****************************************************** 
9/7 13:57:14 Using config file: /home/lwang/condor/install/etc/condor_config 
9/7 13:57:14 Using local config 
files: /home/lwang/condor/install/hosts/HEAVEN/condor_config.local 
9/7 13:57:14 DaemonCore: Command Socket at <194.199.22.87:9618> 
9/7 13:57:14 In ViewServer::Init() 
9/7 13:57:14 In CollectorDaemon::Init() 
9/7 13:57:14 In ViewServer::Config() 
9/7 13:57:14 In CollectorDaemon::Config() 
9/7 13:57:14 enable: Creating stats hash table 
9/7 13:57:14 (Sent 0 ads in response to query) 
9/7 13:57:14 Got QUERY_STARTD_PVT_ADS 
9/7 13:57:14 (Sent 0 ads in response to query) 
9/7 13:57:14 WARNING:  No master ad for < HEAVEN.inrialpes.fr > 
9/7 13:57:14 ScheddAd     : Inserting ** "< HEAVEN.inrialpes.fr , 
194.199.22.87 >" 
9/7 13:57:14 stats: Inserting new hashent for 
'Schedd':'HEAVEN.inrialpes.fr':'194.199.22.87' 
9/7 13:57:19 ** Master < HEAVEN.inrialpes.fr > rejuvenated from recently down 
9/7 13:57:19 stats: Inserting new hashent for 
'Master':'HEAVEN.inrialpes.fr':'194.199.22.87' 
9/7 13:57:30 StartdAd     : Inserting ** "< HEAVEN.inrialpes.fr , 
194.199.22.87 >" 
9/7 13:57:30 stats: Inserting new hashent for 
'Start':'HEAVEN.inrialpes.fr':'194.199.22.87' 
9/7 13:57:30 StartdPvtAd  : Inserting ** "< HEAVEN.inrialpes.fr , 
194.199.22.87 >" 
9/7 13:57:30 stats: Inserting new hashent for 
'StartdPvt':'HEAVEN.inrialpes.fr':'194.199.22.87' 
9/7 13:58:59 Got QUERY_STARTD_ADS 
9/7 13:58:59 (Sent 1 ads in response to query) 
9/7 14:02:14 (Sent 3 ads in response to query) 
9/7 14:02:14 Got QUERY_STARTD_PVT_ADS 
9/7 14:02:14 (Sent 1 ads in response to query) 
9/7 14:07:14 (Sent 3 ads in response to query) 
9/7 14:07:14 Got QUERY_STARTD_PVT_ADS 
9/7 14:07:14 (Sent 1 ads in response to query) 
~ 
 
------------------------------------ 
 
 
 
 
 
Quoting Prashant Lal <lalp@xxxxxxxxxxx>: 
 
> do condor_reschedule on that machien and see 
>  
>  
> LAL 
>  
>  
> -----Original Message----- 
> From: condor-users-bounces@xxxxxxxxxxx on behalf of abhi 
> Sent: Wed 9/7/2005 4:19 PM 
> To: Condor-Users Mail List 
> Subject: RE: [Condor-users] idle jobs 
>   
> execute the following command 
>  
>    echo "START=TRUE" >> /path of machine's local file/condor_config.local 
>    condor_reconfig 
>  
>  
> > -----Original Message----- 
> > From: lizhe.wang@xxxxxxxxxxxx 
> > Sent: Wed,  7 Sep 2005 11:30:39 +0200 
> > To: condor-users@xxxxxxxxxxx 
> > Subject: [Condor-users] idle jobs 
> > 
> > 
> > 
> > Dear all: 
> > I 'm new to condor on a single machine for test. 
> > This machine take the role of submit, execute and manager. 
> > 
> > 
> > I installed condor and submit an example submission file like: 
> > 
> > Executable=/bin/date 
> > Log =/tmp/logr 
> > output=/tmp/logr.out 
> > Queue = 
> > 
> > when I run condor_q -analyze 
> > the output is : 
> >       0 are rejected by your job's requirements 
> >       0 reject your job because of their own requirements 
> >       0 match, but are serving users with a better priority in the pool 
> >       1 match, match, but reject the job for unknown reasons 
> >       0 match, but will not currently preempt their existing job 
> >       0 are available to run your job 
> > 
> > 
> > If I want that any job can run this machine regardless of the status of 
> > the 
> > machine. 
> > I have set the START = True in the configure file, it seems it does not 
> > work. 
> > 
> > How can I configure the file? 
> > any hints? 
> > 
> > thanks, 
> > Lizhe 
> > 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________ 
> > Condor-users mailing list 
> > Condor-users@xxxxxxxxxxx 
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users 
> _______________________________________________ 
> Condor-users mailing list 
> Condor-users@xxxxxxxxxxx 
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users 
>  
>