[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Wrong paths by sending a remote job from windows to linux



Hi @all

Sorry for cross posting but I have an urgent problem to queue a simple batch job from a Windows node to a Linux node. I posted this problem before on the developer mailing list but got no answer yet.

 

I would like to submit a batch job like a Linux shell script from the windows worker machine to a Linux machine. All job files are on a shared file system and not needed to be send over the network. The submitted jobs must be placed in the Linux node queue and execute there.

 

My settings is:

- HTCondor 8.2.2 is installed on both machines (Windows & Linux)

- The Windows machine is configured as submitter and executer with access to a common NFS share

- The Linux head-node is configured as submitter, executer and master also with access to the same NFS share as Windows

- Both are in the same pool, have write and read access to the job and execute files on the NFS share.

 

My approach is to use the condor_submit command with option –name from the Windows machine. So I used the “condor_submit –name server condor.job -debug” command on the windows machine to queue the job to the Linux head node. The job is queued on the Linux machine but goes the hold state there. The condor_submit debug output on the windows machine shows the NFS path to the files and looks ok but the schedlog and “condor_q 109 –better –debug” on the Linux head node tells that the path to the userlog does not exist. Its look like this: “//server/test-job/\//server/test-job/test-jobxxx.log”.

 

I set in addition +PreserveRelativeExecutable = True but it has no changes here.

 

I can execute directly the job from the Linux machine but there must also exists a possibility to send and query from a windows machine or did I miss something?

Is it possible to tell condor relative path to execute and log files on the remote machine in the job file and submit it from other machine? Or is it possible to change the IWD path in the submit file to a path on the remote machine?

 

My Job file: condor.job

Universe = Vanilla

Requirements = ( OpSys == "LINUX" ) && ( TARGET.Arch == "X86_64" ) && \

                ( TARGET.Disk >= 1 ) && \

( TARGET.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,1) ) && \

                ( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == "server.world.loc" ) )

Log         = Test-job.$(Cluster).$(Process).log

Output = Test-job.$(Cluster).$(Process).out

Error      = Test-job.$(Cluster).$(Process).error

Executable         = batch_linux.sh

should_transfer_files = IF_NEEDED

when_to_transfer_output = ON_EXIT

+PreserveRelativeExecutable = True

Queue

 

condor_submit output with –debug option on Windows worker:

 

** Proc 109.0:

WindowsMajorVersion = 6

NTDomain = "WORLD"

ExitStatus = 0

NiceUser = false

LocalSysCpu = 0.0

CurrentTime = time()

CompletionDate = 0

BufferBlockSize = 32768

WindowsBuildNumber = 7601

NumRestarts = 0

MyType = "Job"

CumulativeSuspensionTime = 0

TargetType = "Machine"

RemoteSysCpu = 0.0

QDate = 1412758624

Owner = "condor"

RemoteUserCpu = 0.0

LastSuspensionTime = 0

WindowsMinorVersion = 1

LocalUserCpu = 0.0

WindowsServicePackMajorVersion = 1

WantCheckpoint = false

WindowsServicePackMinorVersion = 0

CondorPlatform = "$CondorPlatform: x86_64_Windows8 $"

WindowsProductType = 1

WhenToTransferOutput = "ON_EXIT"

NumSystemHolds = 0

RemoteWallClockTime = 0.0

NumCkpts = 0

NumJobStarts = 0

CommittedTime = 0

MaxHosts = 1

CommittedSlotTime = 0

CumulativeSlotTime = 0

CoreSize = 0

TotalSuspensions = 0

WantRemoteSyscalls = false

DiskUsage = 1

Iwd = "\\SERVER\java-test-job "

CommittedSuspensionTime = 0

ExitBySignal = false

CondorVersion = "$CondorVersion: 8.2.2 Aug 07 2014 BuildID: 265643 $"

CurrentHosts = 0

JobUniverse = 5

RequestCpus = 1

Cmd = "\\SERVER\java-test-job \batch_linux.sh"

BufferSize = 524288

MinHosts = 1

JobStatus = 1

ImageSize = 1

EnteredCurrentStatus = 1412758624

JobPrio = 0

Err = "Test-job.109.0.error"

UserLog = "\\SERVER\java-test-job\Test-job.109.0.log"

Environment = ""

JobNotification = 0

WantRemoteIO = true

Rank = 0.0

In = "/dev/null"

TransferIn = false

Out = "Test-job.109.0.out"

StreamOut = false

StreamErr = false

ShouldTransferFiles = "IF_NEEDED"

ExecutableSize = 1

TransferInputSizeMB = 0

RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)

RequestDisk = DiskUsage

Requirements = ( ( OpSys == "LINUX" ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.Disk >= 1 ) && ( TARGET.Memory >= ifthenelse(MemoryUsage =!= undefined,MemoryUsage,1) ) && ( ( TARGET.HasFileTransfer ) || ( TARG

ET.FileSystemDomain == "SERVER.world.loc" ) ) ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory )

FileSystemDomain = "SERVER.world.loc"

JobLeaseDuration = 1200

PeriodicHold = false

PeriodicRelease = false

PeriodicRemove = false

>

>

LeaveJobInQueue = false

Arguments = ""

PreserveRelativeExecutable = true

 

 

Output from condor_q –hold 109

 

-- Submitter: server.world.loc : <10.149.51.58:51640> : server.world.loc

ID      OWNER          HELD_SINCE  HOLD_REASON                                                                                                                                          

 109.0   condor       10/8  10:57 Failed to initialize user log to \\SERVER\java-test-job/\\SERVER\java-test-job\Test-job.109.0.log

 

Thomas