[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Cannot add execute node to pool



You might want to verify that your _HOST information is set properly on
your exec node, and that your firewall's have been punched
appropriately. 

condor_config_val -dump | grep HOST 

verify CONDOR_HOST and COLLECTOR_HOST are correct. 

If all else fails set: 
STARTD_DEBUG = D_FULLDEBUG and repost 

Cheers,
Tim

On Wed, 2011-06-08 at 16:47 -0400, Michael Grauer wrote:
> I'm running Condor 7.6.1 on two different CentOS 5.6 machines, one (2
> cpus, call it Twoproc) being the CONDOR_HOST (and for now is also a
> submit and execute node), and the other (16 cpus, call it Sixteenproc)
> I would like to add to this grid as an execute node, but can't get the
> slots added.
> 
> Sixteenproc has MASTER and STARTD daemons running, and when I call
> condor_status on it, it returns the 2 slots from Twoproc, so it seems
> that Sixteenproc can correctly connect to Twoproc.
> 
> I can't seem to find any evidence in the logs on either machine that
> Sixteenproc is trying to get its slots added to Twoproc's grid.
> 
> 
> 
> Any advice on how to debug this?  It would be much appreciated.
> 
> I'm appending the StartLog output from Sixteenproc in case that helps.
> 
> 
> 
> Thanks,
> Mike
> 
> 
> 
> 
> 06/08/11 15:31:55 Setting maximum accepts per cycle 4.
> 06/08/11 15:31:55 ******************************************************
> 06/08/11 15:31:55 ** condor_startd (CONDOR_STARTD) STARTING UP
> 06/08/11 15:31:55 ** /usr/sbin/condor_startd
> 06/08/11 15:31:55 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)
> 06/08/11 15:31:55 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
> 06/08/11 15:31:55 ** $CondorVersion: 7.6.1 May 31 2011 BuildID: 339001 $
> 06/08/11 15:31:55 ** $CondorPlatform: x86_64_rhap_5 $
> 06/08/11 15:31:55 ** PID = 2293
> 06/08/11 15:31:55 ** Log last touched time unavailable (No such file
> or directory)
> 06/08/11 15:31:55 ******************************************************
> 06/08/11 15:31:55 Using config source: /etc/condor/condor_config
> 06/08/11 15:31:55 Using local config sources:
> 06/08/11 15:31:55    /etc/condor/condor_config.local
> 06/08/11 15:31:55 DaemonCore: command socket at <SixteenProc'sIP:9608>
> 06/08/11 15:31:55 DaemonCore: private command socket at <SixteenProc'sIP:9608>
> 06/08/11 15:31:55 Setting maximum accepts per cycle 4.
> 06/08/11 15:32:01 VM-gahp server reported an internal error
> 06/08/11 15:32:01 VM universe will be tested to check if it is available
> 06/08/11 15:32:01 History file rotation is enabled.
> 06/08/11 15:32:01   Maximum history file size is: 20971520 bytes
> 06/08/11 15:32:01   Number of rotated history files is: 2
> 06/08/11 15:32:01 slot1: New machine resource allocated
> 06/08/11 15:32:01 slot2: New machine resource allocated
> 06/08/11 15:32:01 slot3: New machine resource allocated
> 06/08/11 15:32:01 slot4: New machine resource allocated
> 06/08/11 15:32:01 slot5: New machine resource allocated
> 06/08/11 15:32:01 slot6: New machine resource allocated
> 06/08/11 15:32:01 slot7: New machine resource allocated
> 06/08/11 15:32:01 slot8: New machine resource allocated
> 06/08/11 15:32:01 slot9: New machine resource allocated
> 06/08/11 15:32:01 slot10: New machine resource allocated
> 06/08/11 15:32:01 slot11: New machine resource allocated
> 06/08/11 15:32:01 slot12: New machine resource allocated
> 06/08/11 15:32:01 slot13: New machine resource allocated
> 06/08/11 15:32:01 slot14: New machine resource allocated
> 06/08/11 15:32:01 slot15: New machine resource allocated
> 06/08/11 15:32:01 slot16: New machine resource allocated
> 06/08/11 15:32:01 CronJobList: Adding job 'mips'
> 06/08/11 15:32:01 CronJobList: Adding job 'kflops'
> 06/08/11 15:32:01 CronJob: Initializing job 'mips'
> (/usr/libexec/condor/condor_mips)
> 06/08/11 15:32:01 CronJob: Initializing job 'kflops'
> (/usr/libexec/condor/condor_kflops)
> 06/08/11 15:32:01 slot1: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot1: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot1: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 BenchMgr:StartBenchmarks()
> 06/08/11 15:32:01 slot2: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot2: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot2: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot2: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot3: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot3: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot3: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot3: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot4: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot4: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot4: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot4: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot5: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot5: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot5: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot5: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot6: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot6: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot6: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot6: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot7: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot7: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot7: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot7: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot8: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot8: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot8: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot8: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot9: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot9: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot9: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot9: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot10: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot10: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot10: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot10: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot11: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot11: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot11: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot11: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot12: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot12: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot12: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot12: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot13: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot13: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot13: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot13: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot14: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot14: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot14: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot14: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot15: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot15: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot15: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot15: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:01 slot16: State change: IS_OWNER is false
> 06/08/11 15:32:01 slot16: Changing state: Owner -> Unclaimed
> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE
> 06/08/11 15:32:01 slot16: Changing activity: Idle -> Benchmarking
> 06/08/11 15:32:01 slot16: Changing activity: Benchmarking -> Idle
> 06/08/11 15:32:23 State change: benchmarks completed
> 06/08/11 15:32:23 slot1: Changing activity: Benchmarking -> Idle
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/