[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] broken condor_exec.exe path on condor-g submit to windows pool



> Thanks for the pointer, I didn't know about that option. Unfortunately, it doesn't help in my case, as the startd now simply tries to execute the non-existent path as binary:
> 
> ==============================================================
> 06/19 14:24:04 File transfer completed successfully.
> 06/19 14:24:05 Job 43.0 set to execute immediately
> 06/19 14:24:05 Starting a VANILLA universe job with ID: 43.0
> 06/19 14:24:05 Tracking process family by login "condor-reuse-slot1"
> 06/19 14:24:05 IWD: C:\Progra~1\Condor\execute\dir_716
> 06/19 14:24:05 Output file: C:\Progra~1\Condor\execute\dir_716\_condor_stdout
> 06/19 14:24:05 Error file: C:\Progra~1\Condor\execute\dir_716\_condor_stderr
> 06/19 14:24:05 Renice expr "10" evaluated to 10
> 06/19 14:24:05 About to exec C:\Progra~1\Condor\execute\dir_716\condor_exec.gass_cache/local/md5/6d/2254cb0171674e4280321100556c9a/md5/3f/ecb634a0691dc658e01762a1a0d474/data 
> 06/19 14:24:05 ERROR: failed to produce Win32 argument string from CreateProcess:
> 06/19 14:24:05 ERROR "Create_Process(C:\Progra~1\Condor\execute\dir_716\condor_exec.gass_cache/local/md5/6d/2254cb0171674e4280321100556c9a/md5/3f/ecb634a0691dc658e01762a1a0d474/data,, ...) failed: " at line 530 in file ..\src\condor_starter.V6.1\os_proc.cpp
> 06/19 14:24:05 ShutdownFast all jobs.
> ==============================================================

So much for the simple approach.

> It was my understanding that, in the absence of a shared filesystem, condor simply copies whatever "Executable =" points to, and transfers it to \execute\dir_pid\condor_exec.exe whatever the extension is, then executes that, oblivious as to the original path and filename. is this not correct?

Correct.  Except it will preserve file extensions now (not just force .exe), which may be why the above problem has crept in.

> I guess I am looking for the code that generates the odd file name, and work around that. One way I've found is to pre-stage the executable to the gt5 resource, then condor-g submit using "transfer_executable = false" so gass is never used, but still including globusrsl to enable file transfer on the remote pool. This works, but kind of defeats the point of having a file transfer capable meta-scheduler. Ultimately, I would like to be able to use condor-g to transparantly schedule jobs on multiple heterogeneous remote condor pools without having to worry about file staging.

The code for building the odd filename is here:

condor_starter.V6.1/vanilla_proc.cpp (line 116):

116: filename.sprintf ( "condor_exec%s", extension );

But I believe this behaves correctly, except that it accepts invalid characters in the file extension.  I think the problem further up the processing chain, where the file name is originally conceived and the other junk is tacked on with a '.' character.

Someone who is more familiar with the condor-g part might be able to shed some light on to why the other junk ends up in the filename.  It may be simply that we rename it the binary naively in all the other places too.

-B