[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] trying to get HDFS working



I'm running Condor 7.4.2 on Linux, and I'm trying unsuccessfully to get HDFS running. The HDFS daemon seems to load, but then immediately exits.

I've set up one machine as the namenode and a number of our cluster nodes as data nodes. I created and chowned to the Condor user the HDFS_NAMENODE_DIR and HDFS_DATANODE_DIR directories on these machines. I left HDFS_DATANODE_ADDRESS = 0.0.0.0:0 because the docs seem to indicate that it's okay to do so.

Hadoop is version 0.20.2.

When I start the HDFS daemon it loads and exits normally according to the Masterlog:

04/30 12:08:42 Started process "/lusr/condor/sbin/condor_hdfs", pid and pgroup = 28706
04/30 12:08:42 The HDFS (pid 28706) exited with status 0
04/30 12:08:42 restarting /lusr/condor/sbin/condor_hdfs in 3600 seconds


HDFS_LOG4J=DEBUG and HDFS_DEBUG=D_ALL


Namenode log shows:

04/30 12:08:42 (fd:3) (pid:28706) NET_REMAP_ENABLE is undefined, using default value of False 04/30 12:08:42 (fd:3) (pid:28706) LOGS_USE_TIMESTAMP is undefined, using default value of False
04/30 12:08:42 (fd:3) (pid:28706) config: using subsystem 'HDFS', local ''
04/30 12:08:42 (fd:3) (pid:28706) Reading from /proc/cpuinfo
04/30 12:08:42 (fd:3) (pid:28706) Found: Physical-IDs:True; Core-IDs:True
04/30 12:08:42 (fd:3) (pid:28706) Analyzing 2 processors using IDs...
04/30 12:08:42 (fd:3) (pid:28706) Looking at processor #0 (PID:0, CID:0):
04/30 12:08:42 (fd:3) (pid:28706) Comparing P#0 and P#1 : pid:0!=0 or cid:0!=1 (match=No)
04/30 12:08:42 (fd:3) (pid:28706) ncpus = 1
04/30 12:08:42 (fd:3) (pid:28706) P0: match->1
04/30 12:08:42 (fd:3) (pid:28706) Looking at processor #1 (PID:0, CID:1):
04/30 12:08:42 (fd:3) (pid:28706) ncpus = 2
04/30 12:08:42 (fd:3) (pid:28706) P1: match->1
04/30 12:08:42 (fd:3) (pid:28706) Using IDs: 2 processors, 2 CPUs, 0 HTs
04/30 12:08:42 (fd:3) (pid:28706) Reading condor configuration from '/lusr/condor/etc/condor_config' 04/30 12:08:42 (fd:3) (pid:28706) Finding local host information, calling gethostname() 04/30 12:08:42 (fd:3) (pid:28706) gethostname() returned fully qualified name "carrion.cs.utexas.edu" 04/30 12:08:42 (fd:3) (pid:28706) NET_REMAP_ENABLE is undefined, using default value of False 04/30 12:08:42 (fd:3) (pid:28706) PASSWD_CACHE_REFRESH is undefined, using default value of 319 04/30 12:08:42 (fd:3) (pid:28706) Trying to initialize local IP address (config file not read) 04/30 12:08:42 (fd:3) (pid:28706) Have not found an IP yet, calling gethostbyname() 04/30 12:08:42 (fd:3) (pid:28706) Trying to find IP addr for "carrion.cs.utexas.edu" 04/30 12:08:42 (fd:3) (pid:28706) Calling gethostbyname(carrion.cs.utexas.edu)
04/30 12:08:42 (fd:3) (pid:28706) Found IP addr in hostent: 128.83.120.7
04/30 12:08:42 (fd:3) (pid:28706) ENABLE_RUNTIME_CONFIG is undefined, using default value of False 04/30 12:08:42 (fd:3) (pid:28706) ENABLE_PERSISTENT_CONFIG is undefined, using default value of False 04/30 12:08:42 (fd:3) (pid:28706) Trying to initialize local IP address (after reading config) 04/30 12:08:42 (fd:3) (pid:28706) NETWORK_INTERFACE not in config file, using existing value 04/30 12:08:42 (fd:3) (pid:28706) ABORT_ON_EXCEPTION is undefined, using default value of False 04/30 12:08:42 (fd:3) (pid:28706) Config 'HDFS_LOG': no prefix ==> '$(LOG)/HDFSLog' 04/30 12:08:42 (fd:3) (pid:28706) Config 'MAX_HDFS_LOG': no prefix ==> '1000000' 04/30 12:08:42 (fd:3) (pid:28706) PRIV_UNKNOWN --> PRIV_CONDOR at daemon_core_main.cpp:1835
04/30 12:08:42 (fd:3) (pid:28707) KEYCACHE: created: 0x84bf5b0
04/30 12:08:42 (fd:3) (pid:28707) WANT_UDP_COMMAND_SOCKET is undefined, using default value of True 04/30 12:08:42 (fd:3) (pid:28707) HDFS_MAX_FILE_DESCRIPTORS is undefined, using default value of 0 04/30 12:08:42 (fd:3) (pid:28707) MAX_FILE_DESCRIPTORS is undefined, using default value of 0 04/30 12:08:42 (fd:3) (pid:28707) ******************************************************
04/30 12:08:42 (fd:3) (pid:28707) ** condor_hdfs (CONDOR_HDFS) STARTING UP
04/30 12:08:42 (fd:3) (pid:28707) ** /lusr/opt/condor-7.4.2/sbin/condor_hdfs
04/30 12:08:42 (fd:3) (pid:28707) ** SubsystemInfo: name=HDFS type=DAEMON(11) class=DAEMON(1) 04/30 12:08:42 (fd:3) (pid:28707) ** Configuration: subsystem:HDFS local:<NONE> class:DAEMON 04/30 12:08:42 (fd:3) (pid:28707) ** $CondorVersion: 7.4.2 Mar 29 2010 BuildID: 227044 $
04/30 12:08:42 (fd:3) (pid:28707) ** $CondorPlatform: I386-LINUX_RHEL5 $
04/30 12:08:42 (fd:3) (pid:28707) ** PID = 28707
04/30 12:08:42 (fd:3) (pid:28707) ** Log last touched 4/30 11:53:36
04/30 12:08:42 (fd:3) (pid:28707) ** Running as root: Privilege switching in effect 04/30 12:08:42 (fd:3) (pid:28707) ****************************************************** 04/30 12:08:42 (fd:3) (pid:28707) Using config source: /lusr/condor/etc/condor_config
04/30 12:08:42 (fd:3) (pid:28707) Using local config sources:
04/30 12:08:42 (fd:3) (pid:28707)    /lusr/condor/etc/local/carrion
04/30 12:08:42 (fd:3) (pid:28707) Config 'LOG': no prefix ==> '$(RELEASE_DIR)/log/$(HOSTNAME)' 04/30 12:08:42 (fd:3) (pid:28707) Running as root. Enabling specialized core dump routines
04/30 12:08:42 (fd:5) (pid:28707) Setting up command socket
04/30 12:08:42 (fd:5) (pid:28707) CONDOR_INHERIT: "31595 <128.83.120.7:57510> 0 0"
04/30 12:08:42 (fd:5) (pid:28707) Parent PID = 31595
04/30 12:08:42 (fd:5) (pid:28707) Parent Command Sock = <128.83.120.7:57510>
04/30 12:08:42 (fd:7) (pid:28707) LISTEN <128.83.120.7:45693> fd=5
04/30 12:08:42 (fd:7) (pid:28707)