[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] my_popen and condor_shadow.std.exe don't exist



Hi,
 
I have a cluster consisting of two Windows XP machines: a central manager and one execute machine.
 
When condor starts on the execute the machine, there is an error message stating that my_popen failed, and subsequently condor_shadow.pvm.exe and condor_shadow.std.exe failed. I can't find my_popen or condor_shadow.pvm.exe/condor_shadow.std.exe on either the central manager or execute machine. There is a condor_shadow.exe under condor/bin. I am using condor 6.8.4.
 
ScheddLog on the execute machine contains:
 
3/20 11:09:55 (pid:3648) ******************************************************
3/20 11:09:55 (pid:3648) ** condor_schedd.exe (CONDOR_SCHEDD) STARTING UP
3/20 11:09:55 (pid:3648) ** D:\condor-6.8.4\bin\condor_schedd.exe
3/20 11:09:55 (pid:3648) ** $CondorVersion: 6.8.4 Feb 1 2007 $
3/20 11:09:55 (pid:3648) ** $CondorPlatform: INTEL-WINNT50 $
3/20 11:09:55 (pid:3648) ** PID = 3648
3/20 11:09:55 (pid:3648) ** Log last touched 3/20 11:09:49
3/20 11:09:55 (pid:3648) ******************************************************
3/20 11:09:55 (pid:3648) Using config source: D:\condor-6.8.4\condor_config
3/20 11:09:55 (pid:3648) Using local config sources:
3/20 11:09:55 (pid:3648) D:\condor-6.8.4/condor_config.local
3/20 11:09:55 (pid:3648) DaemonCore: Command Socket at <131.242.63.162:3788>
3/20 11:09:55 (pid:3648) History file rotation is enabled.
3/20 11:09:55 (pid:3648) Maximum history file size is: 20971520 bytes
3/20 11:09:55 (pid:3648) Number of rotated history files is: 2
3/20 11:09:56 (pid:3648) my_popen: CreateProcess failed
3/20 11:09:56 (pid:3648) Failed to execute D:\condor-6.8.4/bin/condor_shadow.pvm.exe, ignoring
3/20 11:09:56 (pid:3648) my_popen: CreateProcess failed
3/20 11:09:56 (pid:3648) Failed to execute D:\condor-6.8.4/bin/condor_shadow.std.exe, ignoring
 
When I try to run an MPI job, the job appears to remain idle in the queue. An error message appears in the ScheddLog on the central manager, stating that "Shadow" exited with status 4. Does anyone know what "status 4" means? Is this related to the my_popen/condor_shadow.pvm.exe/condor_shadow.std.exe problem reported when condor starts on the execute machine?
 
ScheddLog on the central manager contains:
3/20 11:16:36 (pid:1064) Sent ad to central manager for jeffreysj@xxxxxxxxxxxxxxx
3/20 11:16:36 (pid:1064) Sent ad to 1 collectors for jeffreysj@xxxxxxxxxxxxxxx
3/20 11:16:38 (pid:1064) Inserting new attribute Scheduler into non-active cluster cid=12 acid=-1
3/20 11:16:40 (pid:1064) Starting add_shadow_birthdate(12.0)
3/20 11:16:40 (pid:1064) Started shadow for job 12.0 on "<131.242.63.162:3789>", (shadow pid = 3084)
3/20 11:16:41 (pid:1064) Sent ad to central manager for jeffreysj@xxxxxxxxxxxxxxx
3/20 11:16:41 (pid:1064) Sent ad to 1 collectors for jeffreysj@xxxxxxxxxxxxxxx
3/20 11:16:41 (pid:1064) DaemonCore: Command received via TCP from host <131.242.63.124:1313>
3/20 11:16:41 (pid:1064) DaemonCore: received command 71003 (GIVE_MATCHES), calling handler (DedicatedSchedule
r::giveMatches)
3/20 11:16:42 (pid:1064) DaemonCore: Command received via UDP from host <131.242.63.124:1315>
3/20 11:16:42 (pid:1064) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())
3/20 11:16:42 (pid:1064) In DedicatedScheduler::reaper pid 3084 has status 4
3/20 11:16:42 (pid:1064) Shadow pid 3084 exited with status 4
3/20 11:16:42 (pid:1064) ERROR: Shadow exited with job exception code!
3/20 11:16:42 (pid:1064) DedicatedScheduler::deallocMatchRec
3/20 11:16:42 (pid:1064) DedicatedScheduler::deallocMatchRec
 
cheers
steve

************************************************************************

The information in this e-mail together with any attachments is

intended only for the person or entity to which it is addressed

and may contain confidential and/or privileged material.

Any form of review, disclosure, modification, distribution

and/or publication of this e-mail message is prohibited.

If you have received this message in error, you are asked to

inform the sender as quickly as possible and delete this message

and any copies of this message from your computer and/or your

computer system network.

************************************************************************