[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] broken condor_exec.exe path on condor-g submit to windows pool



Rob de Graaf wrote:
Hi Ben,

[snip]


It was my understanding that, in the absence of a shared filesystem, condor simply copies whatever "Executable =" points to, and transfers it to \execute\dir_pid\condor_exec.exe whatever the extension is, then executes that, oblivious as to the original path and filename. is this not correct?


Not correct.

See
http://www.cs.wisc.edu/condor/manual/v7.5/2_5Submitting_Job.html#sec:file-transfer
for the info you desire.

Basically, you need to tell Condor in your submit file if you want Condor to move files between submit and execute node. You can put
  should_transfer_files = IF_NEEDED
in your submit file to get the pseudo-automatic behavior you describe above.

Normally, Globus assumes a shared file system between the gatekeeper node and the execute nodes. The big goal you want, having Globus submit jobs into Condor in a manner that does not rely on a shared file system has been robustly coded and used hard in production on the Open Science Grid. They use a set of script called the "Condor NFSLite Jobmanager (for Globus)" (I know, the name is a bit misleading), documented by the VDT team ( the folks who create the software used by Open Science Grid ) here:
http://vdt.cs.wisc.edu/releases/1.10.1/notes/Globus-CondorNFSLite-Setup.html

You could prolly get scripts/wisdom directly from them on this; in addition, I will ping them and ask them to writeup quick admin howto instructions for this to post here:
  https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToAdminRecipes

regards,
Todd



I guess I am looking for the code that generates the odd file name, and work around that. One way I've found is to pre-stage the executable to the gt5 resource, then condor-g submit using "transfer_executable = false" so gass is never used, but still including globusrsl to enable file transfer on the remote pool. This works, but kind of defeats the point of having a file transfer capable meta-scheduler. Ultimately, I would like to be able to use condor-g to transparantly schedule jobs on multiple heterogeneous remote condor pools without having to worry about file staging.

If I'm going about this the wrong way, please enlighten me!

TIA,

Rob

On 06/19/2010 09:57 AM, Burnett, Ben wrote:
Hi Rob:

This is due to some embarrassingly naive file extension detection code. It only hunts backwards from the end of the filename until it finds a '.' character, but does error out if it detects an character that is invalid in filename extension (like '<','>', ':', and in your case '/', etc/).

As a temporary solution before either code (the one that checks for file extensions, and the one that generates that odd file name) gets patched, you can set ALLOW_SCRIPTS_TO_RUN_AS_EXECUTABLES to FALSE on the Windows machines that are experiencing this problem. This simply tells Condor to ignore it's ability to detect a file type, and just treat it as a blain old Windows binary.

Regards,
-B

On 2010-06-18, at 3:40 PM, Rob de Graaf wrote:

I'm having some problems running jobs on a remote windows pool..

The setup: Condor-G ->  GT5 resource w/ Condor LRMS ->  WinXP startd

The remote pool doesn't have a shared filesystem, so I use globusrsl in the condor-g submit file to tell the remote pool it has to use its own file transfer mechanism. It has separate queues (globus jobmanagers) for linux and windows jobs. The linux queue works well, but in the windows queue, jobs fail to execute at the startd:

===============================================================
06/18 22:22:08 Starting a VANILLA universe job with ID: 37.0
06/18 22:22:09 Tracking process family by login "condor-reuse-slot1"
06/18 22:22:09 IWD: C:\Progra~1\Condor\execute\dir_428
06/18 22:22:09 Output file: C:\Progra~1\Condor\execute\dir_428\_condor_stdout 06/18 22:22:10 Error file: C:\Progra~1\Condor\execute\dir_428\_condor_stderr
06/18 22:22:10 Renice expr "10" evaluated to 10
06/18 22:22:10 About to exec C:\Progra~1\Condor\execute\dir_428\condor_exec.gass_cache/local/md5/32/4a5afa0e96a19fa2244e8dd70116ce/md5/f4/91c2ba79ec192a3e127340f8999f71/data 06/18 22:22:10 GetExecutableAndArgumentsByExtention: failed to find extension *.gass_cache/local/md5/32/4a5afa0e96a19fa2244e8dd70116ce/md5/f4/91c2ba79ec192a3e127340f8999f71/data in the registry (last-error =
2).
06/18 22:22:10 Create_Process(): Failed to find an executable for extension *.gass_cache/local/md5/32/4a5afa0e96a19fa2244e8dd70116ce/md5/f4/91c2ba79ec192a3e127340f8999f71/data 06/18 22:22:10 ERROR: C:\Progra~1\Condor\execute\dir_428\condor_exec.gass_cache\local\md5\32\4a5afa0e96a19fa2244e8dd70116ce\md5\f4\91c2ba79ec192a3e127340f8999f71\data.exe is not a valid Windows executable 06/18 22:22:10 ERROR "Create_Process(C:\Progra~1\Condor\execute\dir_428\condor_exec.gass_cache/local/md5/32/4a5afa0e96a19fa2244e8dd70116ce/md5/f4/91c2ba79ec192a3e127340f8999f71/data,, ...) failed: " at line
530 in file ..\src\condor_starter.V6.1\os_proc.cpp
06/18 22:22:10 ShutdownFast all jobs.
===============================================================

The gass_cache part should not be there.. if I understand correctly, condor-g uses gass internally to transfer files to the gt5 resource, but once they are there and a condor_submit is generated, condor's own file transfer mechanism kicks in and the startd should never see the cache url? I've checked to make sure the executable is transferred to Condor\execute\dir_pid\condor_exec.exe on the WinXP startd, and it is.

How can I make this work?

TIA, Rob
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
Todd Tannenbaum                       University of Wisconsin-Madison
Center for High Throughput Computing  Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                 Madison, WI 53706-1685