Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[condor-users] jobs not working in standard universe

Date: Thu, 26 Feb 2004 15:05:31 +1100 (EST)
From: bob@xxxxxxxxxxx
Subject: [condor-users] jobs not working in standard universe
Hey Guys,

I hav a basic program that all it does is print "Hello World!". It works
fine if I submit it to the vanilla universe, but when I compile it with
condor compile and submit it to the standard universe it doesn't work.

After I have compiled it with condor_compile, if I try and run it stand
alone (not submit it to condor) it gives me an "Illegal Instruction" error
message and that is it.

If I submit it to condor, it gets farmed out to a node in my pool, but
dies straight away.

Below are the various logs:

----------hello.log from submit machine-------------------
...
001 (072.000.000) 02/26 14:10:41 Job executing on host: <192.168.2.2:1026>
...
005 (072.000.000) 02/26 14:10:41 Job terminated.
	(0) Abnormal termination (signal 4)
	(0) No core file
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
	224  -  Run Bytes Sent By Job
	3584389  -  Run Bytes Received By Job
	0  -  Total Bytes Sent By Job
	0  -  Total Bytes Received By Job
...
009 (072.000.000) 02/26 14:10:41 Job was aborted by the user.
...
----------------------------------------------------------


-------------------ShadowLog from submit machine---------------

2/26 14:10:28 (?.?) (1046):******* Standard Shadow starting up *******
2/26 14:10:28 (?.?) (1046):** $CondorVersion: 6.6.1 Feb  5 2004 $
2/26 14:10:28 (?.?) (1046):** $CondorPlatform: I386-LINUX-RH72 $
2/26 14:10:28 (?.?) (1046):*******************************************
2/26 14:10:28 (?.?) (1046):uid=0, euid=501, gid=0, egid=501
2/26 14:10:28 (?.?) (1046):Hostname = "<192.168.2.2:1026>", Job = 71.0
2/26 14:10:28 (71.0) (1046):Requesting Primary Starter
2/26 14:10:28 (71.0) (1046):Shadow: Request to run a job was ACCEPTED
2/26 14:10:28 (71.0) (1046):Shadow: RSC_SOCK connected, fd = 17
2/26 14:10:28 (71.0) (1046):Shadow: CLIENT_LOG connected, fd = 18
2/26 14:10:28 (71.0) (1046):My_Filesystem_Domain = "condor"
2/26 14:10:28 (71.0) (1046):My_UID_Domain = "condor"
2/26 14:10:28 (71.0) (1046):	Entering pseudo_get_file_stream
2/26 14:10:28 (71.0) (1046):	file =
"/home/condor/spool/cluster71.ickpt.subproc0"
2/26 14:10:28 (71.0) (1046):	 Weird 0xc0a80201
2/26 14:10:28 (71.0) (1046):	 Weird 0xc0a80201
2/26 14:10:32 (71.0) (1046):Reaped child status - pid 1047 exited with
status 0
2/26 14:10:33 (71.0) (1046):Shadow: Job 71.0 exited, termsig = 4, coredump
= 0, retcode = 0
2/26 14:10:33 (71.0) (1046):Shadow: was killed by signal 4.
2/26 14:10:33 (71.0) (1046):user_time = 13 ticks
2/26 14:10:33 (71.0) (1046):sys_time = 55 ticks
2/26 14:10:33 (71.0) (1046):Static Policy: removing job because
OnExitRemove has become true
2/26 14:10:33 (71.0) (1046):********** Shadow Exiting(102) **********
2/26 14:10:35 (?.?) (1050):******* Standard Shadow starting up *******
2/26 14:10:35 (?.?) (1050):** $CondorVersion: 6.6.1 Feb  5 2004 $
2/26 14:10:35 (?.?) (1050):** $CondorPlatform: I386-LINUX-RH72 $
2/26 14:10:35 (?.?) (1050):*******************************************
2/26 14:10:35 (?.?) (1050):uid=0, euid=501, gid=0, egid=501
2/26 14:10:35 (?.?) (1050):Hostname = "<192.168.2.2:1026>", Job = 72.0
2/26 14:10:36 (72.0) (1050):Requesting Primary Starter
2/26 14:10:36 (72.0) (1050):Shadow: Request to run a job was ACCEPTED
2/26 14:10:36 (72.0) (1050):Shadow: RSC_SOCK connected, fd = 17
2/26 14:10:36 (72.0) (1050):Shadow: CLIENT_LOG connected, fd = 18
2/26 14:10:36 (72.0) (1050):My_Filesystem_Domain = "condor"
2/26 14:10:36 (72.0) (1050):My_UID_Domain = "condor"
2/26 14:10:36 (72.0) (1050):	Entering pseudo_get_file_stream
2/26 14:10:36 (72.0) (1050):	file =
"/home/condor/spool/cluster72.ickpt.subproc0"
2/26 14:10:36 (72.0) (1050):	 Weird 0xc0a80201
2/26 14:10:36 (72.0) (1050):	 Weird 0xc0a80201
2/26 14:10:40 (72.0) (1050):Reaped child status - pid 1052 exited with
status 0
2/26 14:10:41 (72.0) (1050):Shadow: Job 72.0 exited, termsig = 4, coredump
= 0, retcode = 0
2/26 14:10:41 (72.0) (1050):Shadow: was killed by signal 4.
2/26 14:10:41 (72.0) (1050):user_time = 12 ticks
2/26 14:10:41 (72.0) (1050):sys_time = 34 ticks
2/26 14:10:41 (72.0) (1050):Static Policy: removing job because
OnExitRemove has become true
2/26 14:10:41 (72.0) (1050):********** Shadow Exiting(102) **********

---------------------------------------------------------

-------------------StartLog from execute machine-----------------

2/26 14:10:41 DaemonCore: Command received via UDP from host
<192.168.2.1:1025>
2/26 14:10:41 DaemonCore: received command 440 (MATCH_INFO), calling
handler (command_match_info)
2/26 14:10:41 match_info called
2/26 14:10:41 Received match <192.168.2.2:1026>#1430078504
2/26 14:10:41 State change: match notification protocol successful
2/26 14:10:41 Changing state: Unclaimed -> Matched
2/26 14:10:41 DaemonCore: Command received via TCP from host
<192.168.2.1:1060>
2/26 14:10:41 DaemonCore: received command 442 (REQUEST_CLAIM), calling
handler (command_request_claim)
2/26 14:10:41 Request accepted.
2/26 14:10:41 Remote owner is bob@condor
2/26 14:10:41 State change: claiming protocol successful
2/26 14:10:41 Changing state: Matched -> Claimed
2/26 14:10:44 DaemonCore: Command received via TCP from host
<192.168.2.1:1062>
2/26 14:10:44 DaemonCore: received command 444 (ACTIVATE_CLAIM), calling
handler (command_activate_claim)
2/26 14:10:44 Got activate_claim request from shadow (<192.168.2.1:1062>)
2/26 14:10:44 Remote job ID is 71.0
2/26 14:10:44 exec_starter( one.condor, 10, 11 ) : pid 1022
2/26 14:10:44 execl(/usr/local/condor/sbin/condor_starter.std,
"condor_starter", one.condor, 0)
2/26 14:10:44 Got universe "STANDARD" (1) from request classad
2/26 14:10:44 State change: claim-activation protocol successful
2/26 14:10:44 Changing activity: Idle -> Busy
2/26 14:10:49 DaemonCore: Command received via TCP from host
<192.168.2.1:1066>
2/26 14:10:49 DaemonCore: received command 404
(DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler)
2/26 14:10:49 Called deactivate_claim_forcibly()
2/26 14:10:49 Starter pid 1022 exited with status 0
2/26 14:10:49 State change: starter exited
2/26 14:10:49 Changing activity: Busy -> Idle
2/26 14:10:51 DaemonCore: Command received via TCP from host
<192.168.2.1:1069>
2/26 14:10:51 DaemonCore: received command 444 (ACTIVATE_CLAIM), calling
handler (command_activate_claim)
2/26 14:10:51 Got activate_claim request from shadow (<192.168.2.1:1069>)
2/26 14:10:52 Remote job ID is 72.0
2/26 14:10:52 exec_starter( one.condor, 10, 11 ) : pid 1025
2/26 14:10:52 execl(/usr/local/condor/sbin/condor_starter.std,
"condor_starter", one.condor, 0)
2/26 14:10:52 Got universe "STANDARD" (1) from request classad
2/26 14:10:52 State change: claim-activation protocol successful
2/26 14:10:52 Changing activity: Idle -> Busy
2/26 14:10:56 DaemonCore: Command received via TCP from host
<192.168.2.1:1073>
2/26 14:10:56 DaemonCore: received command 404
(DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler)
2/26 14:10:56 Called deactivate_claim_forcibly()
2/26 14:10:56 Starter pid 1025 exited with status 0
2/26 14:10:56 State change: starter exited
2/26 14:10:56 Changing activity: Busy -> Idle
2/26 14:10:57 DaemonCore: Command received via UDP from host
<192.168.2.1:1025>
2/26 14:10:57 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler (command_handler)
2/26 14:10:57 State change: received RELEASE_CLAIM command
2/26 14:10:57 Changing state and activity: Claimed/Idle ->
Preempting/Vacating
2/26 14:10:57 State change: No preempting claim, returning to owner
2/26 14:10:57 Changing state and activity: Preempting/Vacating -> Owner/Idle
2/26 14:10:57 State change: IS_OWNER is false
2/26 14:10:57 Changing state: Owner -> Unclaimed
2/26 14:10:57 DaemonCore: Command received via UDP from host
<192.168.2.1:1025>
2/26 14:10:57 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler (command_handler)
2/26 14:10:57 Error: can't find resource with capability
(<192.168.2.2:1026>#1430078504)
-----------------------------------------------------

---------------------StarterLog from execute machine------------------

2/26 14:10:44 ********** STARTER starting up ***********
2/26 14:10:44 ** $CondorVersion: 6.6.1 Feb  5 2004 $
2/26 14:10:44 ** $CondorPlatform: I386-LINUX-RH72 $
2/26 14:10:44 ******************************************
2/26 14:10:44 Submitting machine is "one.condor"
2/26 14:10:44 EventHandler {
2/26 14:10:44 	func = 0x8071460
2/26 14:10:44 	mask = SIGALRM SIGHUP SIGINT SIGUSR1 SIGUSR2 SIGCHLD SIGTSTP
2/26 14:10:44 }
2/26 14:10:44 Done setting resource limits
2/26 14:10:44 	*FSM* Transitioning to state "GET_PROC"
2/26 14:10:44 	*FSM* Executing state func "get_proc()" [  ]
2/26 14:10:44 Entering get_proc()
2/26 14:10:44 Entering get_job_info()
2/26 14:10:44 Startup Info:
2/26 14:10:44 	Version Number: 1
2/26 14:10:44 	Id: 71.0
2/26 14:10:44 	JobClass: STANDARD
2/26 14:10:44 	Uid: 500
2/26 14:10:44 	Gid: 500
2/26 14:10:44 	VirtPid: -1
2/26 14:10:44 	SoftKillSignal: 20
2/26 14:10:44 	Cmd: "/home/bob/condor_examples/hello.remote"
2/26 14:10:44 	Args: ""
2/26 14:10:44 	Env: ""
2/26 14:10:44 	Iwd: "/home/bob/condor_examples"
2/26 14:10:44 	Ckpt Wanted: TRUE
2/26 14:10:44 	Is Restart: FALSE
2/26 14:10:44 	Core Limit Valid: TRUE
2/26 14:10:44 	Coredump Limit 0
2/26 14:10:44 User uid set to 99
2/26 14:10:44 User uid set to 99
2/26 14:10:44 User Process 71.0 {
2/26 14:10:44   cmd = /home/bob/condor_examples/hello.remote
2/26 14:10:44   args =
2/26 14:10:44   env =
2/26 14:10:44   local_dir = dir_1022
2/26 14:10:44   cur_ckpt = dir_1022/condor_exec.71.0
2/26 14:10:44   core_name = dir_1022/core
2/26 14:10:44   uid = 99, gid = 99
2/26 14:10:44   v_pid = -1
2/26 14:10:44   pid = (NOT CURRENTLY EXECUTING)
2/26 14:10:44   exit_status_valid = FALSE
2/26 14:10:44   exit_status = (NEVER BEEN EXECUTED)
2/26 14:10:44   ckpt_wanted = TRUE
2/26 14:10:44   coredump_limit_exists = TRUE
2/26 14:10:44   coredump_limit = 0
2/26 14:10:44   soft_kill_sig = 20
2/26 14:10:44   job_class = STANDARD
2/26 14:10:44   state = NEW
2/26 14:10:44   new_ckpt_created = FALSE
2/26 14:10:44   ckpt_transferred = FALSE
2/26 14:10:44   core_created = FALSE
2/26 14:10:44   core_transferred = FALSE
2/26 14:10:44   exit_requested = FALSE
2/26 14:10:44   image_size = -1 blocks
2/26 14:10:44   user_time = 0
2/26 14:10:44   sys_time = 0
2/26 14:10:44   guaranteed_user_time = 0
2/26 14:10:44   guaranteed_sys_time = 0
2/26 14:10:44 }
2/26 14:10:44 	*FSM* Transitioning to state "GET_EXEC"
2/26 14:10:44 	*FSM* Executing state func "get_exec()" [ SUSPEND VACATE
DIE  ]
2/26 14:10:44 Entering get_exec()
2/26 14:10:44 Executable is located on submitting host
2/26 14:10:44 Expanded executable name is
"/home/condor/spool/cluster71.ickpt.subproc0"
2/26 14:10:44 Going to try 3 attempts at getting the inital executable
2/26 14:10:44 Entering get_file(
/home/condor/spool/cluster71.ickpt.subproc0, dir_1022/condor_exec.71.0,
0755 )
2/26 14:10:44 Opened "/home/condor/spool/cluster71.ickpt.subproc0" via
file stream
2/26 14:10:48 Get_file() transferred 3584034 bytes, 946058 bytes/second
2/26 14:10:48 Fetched orig ckpt file
"/home/condor/spool/cluster71.ickpt.subproc0" into
"dir_1022/condor_exec.71.0" with 1 attempt
2/26 14:10:48 Executable 'dir_1022/condor_exec.71.0' is linked with
"$CondorVersion: 6.6.1 Feb  5 2004 $" on a "$CondorPlatform:
I386-LINUX-RH72 $"
2/26 14:10:48 	*FSM* Executing transition function "spawn_all"
2/26 14:10:48 Pipe built
2/26 14:10:48 New pipe_fds[14,1]
2/26 14:10:48 cmd_fd = 14
2/26 14:10:48 Calling execve(
"/home/condor/execute/dir_1022/condor_exec.71.0", "condor_exec.71.0",
"-_condor_cmd_fd", "14", 0, "CONDOR_VM=vm1",
"CONDOR_SCRATCH_DIR=/home/condor/execute/dir_1022", 0 )
2/26 14:10:48 Started user job - PID = 1023
2/26 14:10:48 cmd_fp = 0x83060a0
2/26 14:10:48 end
2/26 14:10:48 	*FSM* Transitioning to state "SUPERVISE"
2/26 14:10:48 	*FSM* Executing state func "supervise_all()" [ GET_NEW_PROC
SUSPEND VACATE ALARM DIE CHILD_EXIT PERIODIC_CKPT  ]
2/26 14:10:49 	*FSM* Got asynchronous event "CHILD_EXIT"
2/26 14:10:49 	*FSM* Executing transition function "reaper"
2/26 14:10:49 Process 1023 killed by signal 4
2/26 14:10:49 Process exited abnormally
2/26 14:10:49 	*FSM* Transitioning to state "PROC_EXIT"
2/26 14:10:49 	*FSM* Executing state func "proc_exit()" [ DIE  ]
2/26 14:10:49 	*FSM* Transitioning to state "SEND_CORE"
2/26 14:10:49 	*FSM* Executing state func "send_core()" [ SUSPEND VACATE
DIE  ]
2/26 14:10:49 No core file to send - probably ran out of disk
2/26 14:10:49 	*FSM* Executing transition function "dispose_one"
2/26 14:10:49 Sending final status for process 71.0
2/26 14:10:49 STATUS encoded as ABNORMAL, NO CORE
2/26 14:10:49 User time = 0.000000 seconds
2/26 14:10:49 System time = 0.020000 seconds
2/26 14:10:49 Unlinked "dir_1022/condor_exec.71.0"
2/26 14:10:49 Can't unlink "dir_1022/core" - errno = 2
2/26 14:10:49 Can't remove directory "dir_1022" - errno = 39
2/26 14:10:49 	*FSM* Transitioning to state "SUPERVISE"
2/26 14:10:49 	*FSM* Got asynchronous event "DIE"
2/26 14:10:49 	*FSM* Executing transition function "req_die"
2/26 14:10:49 	*FSM* Transitioning to state "TERMINATE"
2/26 14:10:49 	*FSM* Executing state func "terminate_all()" [  ]
2/26 14:10:49 	*FSM* Transitioning to state "SEND_STATUS_ALL"
2/26 14:10:49 	*FSM* Executing state func "dispose_all()" [  ]
2/26 14:10:49 	*FSM* Reached state "END"
2/26 14:10:49 ********* STARTER terminating normally **********
2/26 14:10:52 ********** STARTER starting up ***********
2/26 14:10:52 ** $CondorVersion: 6.6.1 Feb  5 2004 $
2/26 14:10:52 ** $CondorPlatform: I386-LINUX-RH72 $
2/26 14:10:52 ******************************************
2/26 14:10:52 Submitting machine is "one.condor"
2/26 14:10:52 EventHandler {
2/26 14:10:52 	func = 0x8071460
2/26 14:10:52 	mask = SIGALRM SIGHUP SIGINT SIGUSR1 SIGUSR2 SIGCHLD SIGTSTP
2/26 14:10:52 }
2/26 14:10:52 Done setting resource limits
2/26 14:10:52 	*FSM* Transitioning to state "GET_PROC"
2/26 14:10:52 	*FSM* Executing state func "get_proc()" [  ]
2/26 14:10:52 Entering get_proc()
2/26 14:10:52 Entering get_job_info()
2/26 14:10:52 Startup Info:
2/26 14:10:52 	Version Number: 1
2/26 14:10:52 	Id: 72.0
2/26 14:10:52 	JobClass: STANDARD
2/26 14:10:52 	Uid: 500
2/26 14:10:52 	Gid: 500
2/26 14:10:52 	VirtPid: -1
2/26 14:10:52 	SoftKillSignal: 20
2/26 14:10:52 	Cmd: "/home/bob/condor_examples/hello.remote"
2/26 14:10:52 	Args: ""
2/26 14:10:52 	Env: ""
2/26 14:10:52 	Iwd: "/home/bob/condor_examples"
2/26 14:10:52 	Ckpt Wanted: TRUE
2/26 14:10:52 	Is Restart: FALSE
2/26 14:10:52 	Core Limit Valid: TRUE
2/26 14:10:52 	Coredump Limit 0
2/26 14:10:52 User uid set to 99
2/26 14:10:52 User uid set to 99
2/26 14:10:52 User Process 72.0 {
2/26 14:10:52   cmd = /home/bob/condor_examples/hello.remote
2/26 14:10:52   args =
2/26 14:10:52   env =
2/26 14:10:52   local_dir = dir_1025
2/26 14:10:52   cur_ckpt = dir_1025/condor_exec.72.0
2/26 14:10:52   core_name = dir_1025/core
2/26 14:10:52   uid = 99, gid = 99
2/26 14:10:52   v_pid = -1
2/26 14:10:52   pid = (NOT CURRENTLY EXECUTING)
2/26 14:10:52   exit_status_valid = FALSE
2/26 14:10:52   exit_status = (NEVER BEEN EXECUTED)
2/26 14:10:52   ckpt_wanted = TRUE
2/26 14:10:52   coredump_limit_exists = TRUE
2/26 14:10:52   coredump_limit = 0
2/26 14:10:52   soft_kill_sig = 20
2/26 14:10:52   job_class = STANDARD
2/26 14:10:52   state = NEW
2/26 14:10:52   new_ckpt_created = FALSE
2/26 14:10:52   ckpt_transferred = FALSE
2/26 14:10:52   core_created = FALSE
2/26 14:10:52   core_transferred = FALSE
2/26 14:10:52   exit_requested = FALSE
2/26 14:10:52   image_size = -1 blocks
2/26 14:10:52   user_time = 0
2/26 14:10:52   sys_time = 0
2/26 14:10:52   guaranteed_user_time = 0
2/26 14:10:52   guaranteed_sys_time = 0
2/26 14:10:52 }
2/26 14:10:52 	*FSM* Transitioning to state "GET_EXEC"
2/26 14:10:52 	*FSM* Executing state func "get_exec()" [ SUSPEND VACATE
DIE  ]
2/26 14:10:52 Entering get_exec()
2/26 14:10:52 Executable is located on submitting host
2/26 14:10:52 Expanded executable name is
"/home/condor/spool/cluster72.ickpt.subproc0"
2/26 14:10:52 Going to try 3 attempts at getting the inital executable
2/26 14:10:52 Entering get_file(
/home/condor/spool/cluster72.ickpt.subproc0, dir_1025/condor_exec.72.0,
0755 )
2/26 14:10:52 Opened "/home/condor/spool/cluster72.ickpt.subproc0" via
file stream
2/26 14:10:55 Get_file() transferred 3584034 bytes, 1091511 bytes/second
2/26 14:10:55 Fetched orig ckpt file
"/home/condor/spool/cluster72.ickpt.subproc0" into
"dir_1025/condor_exec.72.0" with 1 attempt
2/26 14:10:56 Executable 'dir_1025/condor_exec.72.0' is linked with
"$CondorVersion: 6.6.1 Feb  5 2004 $" on a "$CondorPlatform:
I386-LINUX-RH72 $"
2/26 14:10:56 	*FSM* Executing transition function "spawn_all"
2/26 14:10:56 Pipe built
2/26 14:10:56 New pipe_fds[14,1]
2/26 14:10:56 cmd_fd = 14
2/26 14:10:56 Calling execve(
"/home/condor/execute/dir_1025/condor_exec.72.0", "condor_exec.72.0",
"-_condor_cmd_fd", "14", 0, "CONDOR_VM=vm1",
"CONDOR_SCRATCH_DIR=/home/condor/execute/dir_1025", 0 )
2/26 14:10:56 Started user job - PID = 1026
2/26 14:10:56 cmd_fp = 0x83060a0
2/26 14:10:56 end
2/26 14:10:56 	*FSM* Transitioning to state "SUPERVISE"
2/26 14:10:56 	*FSM* Executing state func "supervise_all()" [ GET_NEW_PROC
SUSPEND VACATE ALARM DIE CHILD_EXIT PERIODIC_CKPT  ]
2/26 14:10:56 	*FSM* Got asynchronous event "CHILD_EXIT"
2/26 14:10:56 	*FSM* Executing transition function "reaper"
2/26 14:10:56 Process 1026 killed by signal 4
2/26 14:10:56 Process exited abnormally
2/26 14:10:56 	*FSM* Transitioning to state "PROC_EXIT"
2/26 14:10:56 	*FSM* Executing state func "proc_exit()" [ DIE  ]
2/26 14:10:56 	*FSM* Transitioning to state "SEND_CORE"
2/26 14:10:56 	*FSM* Executing state func "send_core()" [ SUSPEND VACATE
DIE  ]
2/26 14:10:56 No core file to send - probably ran out of disk
2/26 14:10:56 	*FSM* Executing transition function "dispose_one"
2/26 14:10:56 Sending final status for process 72.0
2/26 14:10:56 STATUS encoded as ABNORMAL, NO CORE
2/26 14:10:56 User time = 0.010000 seconds
2/26 14:10:56 System time = 0.010000 seconds
2/26 14:10:56 Unlinked "dir_1025/condor_exec.72.0"
2/26 14:10:56 Can't unlink "dir_1025/core" - errno = 2
2/26 14:10:56 Can't remove directory "dir_1025" - errno = 39
2/26 14:10:56 	*FSM* Transitioning to state "SUPERVISE"
2/26 14:10:56 	*FSM* Executing state func "supervise_all()" [ GET_NEW_PROC
SUSPEND VACATE ALARM DIE CHILD_EXIT PERIODIC_CKPT  ]
2/26 14:10:56 	*FSM* Got asynchronous event "DIE"
2/26 14:10:56 	*FSM* Executing transition function "req_die"
2/26 14:10:56 	*FSM* Transitioning to state "TERMINATE"
2/26 14:10:56 	*FSM* Executing state func "terminate_all()" [  ]
2/26 14:10:56 	*FSM* Transitioning to state "SEND_STATUS_ALL"
2/26 14:10:56 	*FSM* Executing state func "dispose_all()" [  ]
2/26 14:10:56 	*FSM* Reached state "END"
2/26 14:10:56 ********* STARTER terminating normally **********
------------------------------------------------------------------



I am running Condor 6.6.1 on RedHat 7.3 on an intel P133 with 32 meg RAM.

Could it be that there are not enough resources for it to checkpoint???

The fact that when I run the hello executable standalone dies on me
suggests to me that it is a problem with the condor_compile process and
not so much to do with the condor system itself???

Any suggestions???

Cheers,
Leighton.
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>
Follow-Ups:
- Re: [condor-users] jobs not working in standard universe
  - From: Peter Keller
Prev by Date: [condor-users] error with condor compile???
Next by Date: Re: [condor-users] Some questions concerning security in Condor
Previous by thread: [condor-users] error with condor compile???
Next by thread: Re: [condor-users] jobs not working in standard universe
Index(es):
- Date
- Thread
Mailing List Archives

Public Access

[condor-users] jobs not working in standard universe