[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] CEDAR:6001:Failed to fetch ads



Stack dump for process 2638 at timestamp 1290067960 (19 frames)
condor_schedd(dprintf_dump_stack+0xb3)[0x5c0e7a]
condor_schedd(linux_sig_coredump(int)+0x28)[0x5b1b68]
/lib64/tls/libpthread.so.0[0x3339b0c5b0]
/lib64/tls/libc.so.6(__nss_hostname_digits_dots+0x47)[0x33394d7a87]
/lib64/tls/libc.so.6(gethostbyname+0xac)[0x33394dab6c]
condor_schedd(condor_gethostbyname+0xa9)[0x5c5e8a]
condor_schedd(Scheduler::negotiate(int, Stream*)+0x3b8)[0x518d72]
...

and

11/18 08:12:19 (pid:2638) IPVERIFY: unable to resolve IP address of FALSE

Something is probably strange with your network setup.

Best,


matt

On 11/18/2010 03:17 AM, Santanu Das wrote:
Thanks Matt for the suggestion. I already tried that but didn't understand anything.
If I run "condor_schedd -f" by hand, I see these in the log:



11/18 08:12:13 (pid:2638) passwd_cache::cache_uid(): getpwnam("condor") failed: user not found
11/18 08:12:13 (pid:2638) passwd_cache::cache_uid(): getpwnam("condor") failed: user not found
11/18 08:12:13 (pid:2638) ******************************************************
11/18 08:12:13 (pid:2638) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
11/18 08:12:13 (pid:2638) ** /opt/condor-7.4.4/sbin/condor_schedd
11/18 08:12:13 (pid:2638) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
11/18 08:12:13 (pid:2638) ** Configuration: subsystem:SCHEDD local:<NONE>  class:DAEMON
11/18 08:12:13 (pid:2638) ** $CondorVersion: 7.4.4 Oct 13 2010 BuildID: 279383 $
11/18 08:12:13 (pid:2638) ** $CondorPlatform: X86_64-LINUX_RHEL3 $
11/18 08:12:13 (pid:2638) ** PID = 2638
11/18 08:12:13 (pid:2638) ** Log last touched 11/18 08:11:16
11/18 08:12:13 (pid:2638) ******************************************************
11/18 08:12:13 (pid:2638) Using config source: /opt/condor/etc/condor_config
11/18 08:12:13 (pid:2638) Using local config sources:
11/18 08:12:13 (pid:2638)    /home/condorr/config/group_config
11/18 08:12:13 (pid:2638)    /home/condorr/condor_config.local
11/18 08:12:13 (pid:2638) DaemonCore: Command Socket at<172.24.116.185:9683>
11/18 08:12:13 (pid:2638) History file rotation is enabled.
11/18 08:12:13 (pid:2638)   Maximum history file size is: 20971520 bytes
11/18 08:12:13 (pid:2638)   Number of rotated history files is: 2
11/18 08:12:13 (pid:2638) About to rotate ClassAd log /home/condorr/spool/job_queue.log
11/18 08:12:19 (pid:2638) IPVERIFY: unable to resolve IP address of FALSE
11/18 08:12:19 (pid:2638) Sent ad to central manager for group_atlas.prdatl13@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:19 (pid:2638) Sent ad to 1 collectors for group_atlas.prdatl13@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:19 (pid:2638) Sent ad to central manager for group_atlas.prdatl07@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:19 (pid:2638) Sent ad to 1 collectors for group_atlas.prdatl07@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:19 (pid:2638) Sent ad to central manager for group_monitor.ops006@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:19 (pid:2638) Sent ad to 1 collectors for group_monitor.ops006@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:19 (pid:2638) Sent ad to central manager for group_monitor.ops002@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:19 (pid:2638) Sent ad to 1 collectors for group_monitor.ops002@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:19 (pid:2638) Sent ad to central manager for group_monitor.sgmops@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:19 (pid:2638) Sent ad to 1 collectors for group_monitor.sgmops@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:26 (pid:2638) Sent ad to central manager for group_atlas.prdatl13@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:26 (pid:2638) Sent ad to 1 collectors for group_atlas.prdatl13@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:26 (pid:2638) Sent ad to central manager for group_atlas.prdatl07@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:26 (pid:2638) Sent ad to 1 collectors for group_atlas.prdatl07@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:26 (pid:2638) Sent ad to central manager for group_monitor.ops006@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:26 (pid:2638) Sent ad to 1 collectors for group_monitor.ops006@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:26 (pid:2638) Sent ad to central manager for group_monitor.ops002@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:26 (pid:2638) Sent ad to 1 collectors for group_monitor.ops002@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:26 (pid:2638) Sent ad to central manager for group_monitor.sgmops@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:26 (pid:2638) Sent ad to 1 collectors for group_monitor.sgmops@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:31 (pid:2638) Sent ad to central manager for group_atlas.prdatl13@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:31 (pid:2638) Sent ad to 1 collectors for group_atlas.prdatl13@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:31 (pid:2638) Sent ad to central manager for group_atlas.prdatl07@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:31 (pid:2638) Sent ad to 1 collectors for group_atlas.prdatl07@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:31 (pid:2638) Sent ad to central manager for group_monitor.ops006@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:31 (pid:2638) Sent ad to 1 collectors for group_monitor.ops006@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:31 (pid:2638) Sent ad to central manager for group_monitor.ops002@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:31 (pid:2638) Sent ad to 1 collectors for group_monitor.ops002@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:31 (pid:2638) Sent ad to central manager for group_monitor.sgmops@xxxxxxxxxxxxxxxxxxxxxxxx
11/18 08:12:31 (pid:2638) Sent ad to 1 collectors for group_monitor.sgmops@xxxxxxxxxxxxxxxxxxxxxxxx
Stack dump for process 2638 at timestamp 1290067960 (19 frames)
condor_schedd(dprintf_dump_stack+0xb3)[0x5c0e7a]
condor_schedd(_Z18linux_sig_coredumpi+0x28)[0x5b1b68]
/lib64/tls/libpthread.so.0[0x3339b0c5b0]
/lib64/tls/libc.so.6(__nss_hostname_digits_dots+0x47)[0x33394d7a87]
/lib64/tls/libc.so.6(gethostbyname+0xac)[0x33394dab6c]
condor_schedd(condor_gethostbyname+0xa9)[0x5c5e8a]
condor_schedd(_ZN9Scheduler9negotiateEiP6Stream+0x3b8)[0x518d72]
condor_schedd(_ZN9Scheduler11doNegotiateEiP6Stream+0x23)[0x51881d]
condor_schedd(_ZN10DaemonCore18CallCommandHandlerEiP6Streamb+0x216)[0x5a0998]
condor_schedd(_ZN10DaemonCore9HandleReqEP6StreamS1_+0x343e)[0x5a44ac]
condor_schedd(_ZN10DaemonCore22HandleReqSocketHandlerEP6Stream+0x99)[0x5a0ecd]
condor_schedd(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x253)[0x5a04e7]
condor_schedd(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x36)[0x5a0288]
condor_schedd(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x3d)[0x665983]
condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x1ae)[0x5a024a]
condor_schedd(_ZN10DaemonCore6DriverEv+0x1585)[0x59ff87]
condor_schedd(main+0x1867)[0x5b4a57]
/lib64/tls/libc.so.6(__libc_start_main+0xdb)[0x333941c3fb]
condor_schedd(_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_c+0x3a)[0x50b4aa]



Can anyone help me with this log please?

cheers,
Santanu


On 18 Nov 2010, at 03:56, Matthew Farrellee wrote:

On 11/17/2010 06:07 PM, Santanu Das wrote:
Hi all,

I'm using  v7.4.4  dynamically linked RHEL3 rpm on Scientific Linux 4 and every time I try to use condor_q I get "Failed to fetch ads..." error

	[root@serv07 log]# condor_q

	-- Failed to fetch ads from:<172.24.116.185:9570>   : serv07.hep.phy.cam.ac.uk
	CEDAR:6001:Failed to connect to<172.24.116.185:9570>

Looks like condor_schedd is crashing. How can I fix this? Thanks in advance for any help.

Cheers,
Santanu

You should have a look in the Schedd's log to get an idea of what's happening.

$ less $(condor_config_val SCHEDD_LOG)

Best,


matt