[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] BUG in condor with Windows 2003?



Mark,

Not really bug, but enhanced security on Windows 2003. Check
permissions for cmd.exe. By default members of users group  can not
execute this file.

Hope this helps,
Roman.

2007/1/15, Mark Ellul <mark@xxxxxxxxxxx>:
Hi Everyone,

I think there might be a bug with condor v6.8.3 working with Windows
2003. I have 2 Windows 2003 Servers and a Windows XP box connected to
one pool. The pool Manager is on a windows 2003 box, which does not run
any jobs.

I have a job which consists of a batch file which runs a PHP script by
copying PHP onto the machine and runs the script. With the Same Pool
when the Win XP machine is assigned the job, it runs no problem. However
when it is assigned to the windows 2003 box, I get an error as below...
(more info to follow.....)

-------------------------------------------------------------------------------------------------------
001 (030.000.000) 01/15 16:51:18 Job executing on host: <192.168.2.202:4544>
...
007 (030.000.000) 01/15 16:51:18 Shadow exception!
    Error from starter on vm1@STAGING:
Create_Process(C:\WINDOWS\system32\cmd.exe,/Q /C condor_exec.bat
translate_desc_en_pt.php, VIDEOID, ...) failed
    0  -  Run Bytes Sent By Job
    8139560  -  Run Bytes Received By Job
...

---------------------------------------------------------------------------------------------
Submit file
--------------------------------------------------------------------------------------------

# file name:  my_program.condor
# Condor submit description file for my_program
Executable      = p.bat
Universe        = vanilla
Error           = logs/$(cluster).err.log
Output          = logs/$(cluster).out.log
Log             = logs/$(cluster).log

initialdir      = files

should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files =
translate_desc_en_pt.php,php.exe,gtkextra.dll,iconv.dll,intl.dll,libgdk-0.dll,libglade.dll,libglib-2.0-0.dll,libgmodule-2.0-0.dll,libgobject-2.0-0.dll,libgthread-2.0-0.dll,libgtk-0.dll,libxml2.dll,php4ts.dll,php.ini,php.ini-gtk,php_gtk.dll,php_gtk_combobutton.dll,php_gtk_extra.dll,php_gtk_libglade.dll,php_gtk_scintilla.dll,php_gtk_scrollpane.dll,php_gtk_spaned.dll,php_gtk_sqpane.dll,php_win.exe,
php-cgi.exe,zlib.dll

Arguments       = translate_desc_en_pt.php, VIDEOID
#Arguments       = -?
Requirements =  OpSys != "Dummy" && Arch != "Dummy"


Queue
--------------------------------------------------------------------------------------------

1/15 16:51:15 ******************************************************
1/15 16:51:15 ** condor_shadow (CONDOR_SHADOW) STARTING UP
1/15 16:51:15 ** C:\condor\bin\condor_shadow.exe
1/15 16:51:15 ** $CondorVersion: 6.8.3 Jan  5 2007 $
1/15 16:51:15 ** $CondorPlatform: INTEL-WINNT50 $
1/15 16:51:15 ** PID = 3948
1/15 16:51:15 ** Log last touched 1/15 16:51:13
1/15 16:51:15 ******************************************************
1/15 16:51:15 Using config source: C:\condor\condor_config
1/15 16:51:15 Using local config sources:
1/15 16:51:15    C:\condor/condor_config.local
1/15 16:51:15 DaemonCore: Command Socket at <192.168.2.124:4788>
1/15 16:51:15 Initializing a VANILLA shadow for job 30.0
1/15 16:51:15 (30.0) (3948): Request to run on <192.168.2.202:4544> was
ACCEPTED
1/15 16:51:18 (30.0) (3948): ERROR "Error from starter on vm1@STAGING:
Create_Process(C:\WINDOWS\system32\cmd.exe,/Q /C condor_exec.bat
translate_desc_en_pt.php, VIDEOID, ...) failed" at line 643 in file
..\src\condor_shadow.V6.1\pseudo_ops.C
1/15 16:53:59 ******************************************************


I then looked up a previous users post whose problem was similar and
using the
http://condor.optena.com/display/CONDOR/Common+Windows+Problems page I
can see that there needs a VM1_USER in the configuration which I have
done...

Then the error I get is below....

1/15 17:18:48 ******************************************************
1/15 17:18:48 ** condor_shadow (CONDOR_SHADOW) STARTING UP
1/15 17:18:48 ** C:\condor\bin\condor_shadow.exe
1/15 17:18:48 ** $CondorVersion: 6.8.3 Jan  5 2007 $
1/15 17:18:48 ** $CondorPlatform: INTEL-WINNT50 $
1/15 17:18:48 ** PID = 2080
1/15 17:18:48 ** Log last touched 1/15 17:11:04
1/15 17:18:48 ******************************************************
1/15 17:18:48 Using config source: C:\condor\condor_config
1/15 17:18:48 Using local config sources:
1/15 17:18:48    C:\condor/condor_config.local
1/15 17:18:48 DaemonCore: Command Socket at <192.168.2.124:1098>
1/15 17:18:48 Initializing a VANILLA shadow for job 32.0
1/15 17:18:48 (32.0) (2080): Request to run on <192.168.2.202:3310> was
ACCEPTED
1/15 17:18:49 (32.0) (2080): condor_read(): recv() returned -1, errno =
10054, assuming failure reading 5 bytes from <192.168.2.202:3310>.
1/15 17:18:49 (32.0) (2080): Can no longer talk to condor_starter
<192.168.2.202:3310>
1/15 17:18:49 (32.0) (2080): Trying to reconnect to disconnected job
1/15 17:18:49 (32.0) (2080): LastJobLeaseRenewal: 1168881529 Mon Jan 15
17:18:49 2007
1/15 17:18:49 (32.0) (2080): JobLeaseDuration: 1200 seconds
1/15 17:18:49 (32.0) (2080): JobLeaseDuration remaining: 1200
1/15 17:18:49 (32.0) (2080): Attempting to locate disconnected starter
1/15 17:18:49 (32.0) (2080): Found starter: <192.168.2.202:3362>
1/15 17:18:49 (32.0) (2080): Attempting to reconnect to starter
<192.168.2.202:3362>
1/15 17:18:50 (32.0) (2080): attempt to connect to <192.168.2.202:3362>
failed: connect errno = 10061 connection refused.
1/15 17:18:50 (32.0) (2080): Attempt to reconnect failed: Failed to
connect to starter <192.168.2.202:3362>
1/15 17:18:50 (32.0) (2080): JobLeaseDuration remaining: 1199
1/15 17:18:50 (32.0) (2080): Scheduling another attempt to reconnect in
8 seconds
1/15 17:18:58 (32.0) (2080): Attempting to locate disconnected starter
1/15 17:18:58 (32.0) (2080): locateStarter(): ClaimId
(<192.168.2.202:3310>#1168881462#1) and GlobalJobId (
cellast-cxo5mw2#1168881032#32.0 ) not found
1/15 17:18:58 (32.0) (2080): Reconnect FAILED: Job not found at
execution machine
1/15 17:18:58 (32.0) (2080): **** condor_shadow (condor_SHADOW) EXITING
WITH STATUS 107

My Gut feeling is that its a bug with the file transfer of multiple
files with Windows 2003. The reason I say its the multiple files... is
that I can get a simple hello world transferring the hello.exe accross
no problems... its just when its multiple files.

The exact same job description works fine on the same pool to windows XP.

Any thoughts would be muchly appreciated.

Regards

Mark

--
Mark Ellul
Research and Development Manager

This email and any attachments may be confidential or legally privileged.

If you received this message in error or are not the intended recipient. you should destroy the e-mail message and any attachments or copies, and you are prohibited from retaining, distributing, disclosing or using any information containing herein. Please inform us of the erroneous delivery by return e-mail. Thank you for your co-operation.


www.cellcast.tv

150 Great Portland Street

London

W1W 6QD

UK

Tel: (020) 7190 0300

Fax: (020) 7190 0301

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR