[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Default value for "Iwd" classad? (Python-Condor)



Ah, that makes more sense. I actually am using a custom file-transfer plugin to upload the output files to a different server than the submit machine, and thus don't need the files to be transferred to the submit machine after execution.

How would I prevent Condor from trying to send the files back to the submit node?

On Tue, Mar 26, 2013 at 3:39 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
Hi Jordan,

Iwd refers to a directory on the submit machine.  If HTCondor is transferring your files between submit and execute nodes, what directory would you like it to use on the submit side.

The file transfer is performed as the submitting user.  So, if you submit as user "ubuntu", "/home/ubuntu/" is a fine place for HTCondor to return the output files to.

Typically, if you run "condor_submit", Iwd is set to the directory where you invoked the condor_submit from.

Brian

On Mar 26, 2013, at 2:03 PM, Jordan Williamson <jordan.williamson@xxxxxxxxxxx> wrote:

Oh shoot, those are the classads for a job that ran fine (I temporarily set the Iwd to "/home/ubuntu", as I knew that existed).

Classads for failing job:

ImageSize = 1
LeaveJobInQueue = true
JobNotification = 2
TransferExecutable = false
StreamIn = false
AutoClusterId = 1
StreamErr = false
ShouldTransferFiles = "YES"
>JobStatus = 1
LastJobStatus = 0
Owner = "ubuntu"
MyType = "Job"
Cmd = "/usr/bin/blender"
WhenToTransferOutput = "ON_EXIT"
GlobalJobId = "
<machine-ip>#670.22#1364323301"
PeriodicRemove = false
ImageSize_RAW = 1
User = "ubuntu@<machine-ip>"
CurrentTime = time()
PeriodicHold = false
RootDir = "/"
Iwd = "/"
>AutoClusterAttrs = "JobUniverse,LastCheckpointPlatform,NumCkpts,jordan,Requirements,NiceUser,ConcurrencyLimits"
QDate = 1364323304
ClusterId = 670
PeriodicRelease = false
Requirements = OpSys == "LINUX" && Arch == "INTEL"
StreamOut = false
Arguments = "-b dolphin.blend -o //render_# -F PNG -x 1 -f $(Process)"
TargetType = "Machine"
TransferInput = "<url>"
RemoteUserCpu = 0
JobPrio = 0
JobUniverse = 5
ProcId = 22
ServerTime = 1364324445

Hold error:

Error from <execute-node>: STARTER at
<execute-node> failed to send file(s) to <execute-node>; SHADOW at <execute-node> failed to write to file //_condor_stdout: (errno 13) Permission denied

Here, I tried using "/" as the Iwd. If I used something like "/etc", the error would say "failed to write to file /etc/_condor_stdout", etc.

On Tue, Mar 26, 2013 at 2:34 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
Hi Jordan,

Looks like things are running right now.  What is the hold message you eventually receive?

FWIW - it would also be interesting to see the ClassAd you give to the Schedd object for submission.

Brian

On Mar 26, 2013, at 1:29 PM, Jordan Williamson <jordan.williamson@xxxxxxxxxxx> wrote:

Classads:

DiskUsage_RAW = 319
Requirements = OpSys == "LINUX" && Arch == "INTEL"
RemoteUserCpu = 0.0
JobFinishedHookDone = 1364322130
>
GlobalJobId = "<machine-ip>#669.23#1364321911"
NumJobStarts = 1
ExitCode = 0
StreamIn = false
ImageSize = 15000
CurrentTime = time()
JobStartDate = 1364322127
CurrentHosts = 0
JobCurrentStartDate = 1364322127
TargetType = "Machine"
ServerTime = 1364322453
LastPublicClaimId = "<machine-ip>#1364246102#73#..."
Cmd = "/usr/bin/blender"
>
TransferExecutable = false
JobUniverse = 5
BytesRecvd = 74.000000
RemoteWallClockTime = 3.000000
JobNotification = 2
Iwd = "/home/ubuntu"
RemoteSysCpu = 0.0
MachineAttrCpus0 = 1
Owner = "ubuntu"
LastJobStatus = 2
MemoryUsage = ( ( ResidentSetSize + 1023 ) / 1024 )
WhenToTransferOutput = "ON_EXIT"
EnteredCurrentStatus = 1364322130
LastJobLeaseRenewal = 1364322130
PeriodicHold = false
AutoClusterId = 1
JobCurrentStartExecutingDate = 1364322129
BytesSent = 24849.000000
JobPrio = 0
RootDir = "/"
PeriodicRelease = false
NumJobMatches = 1
LastMatchTime = 1364322127
PeriodicRemove = false
LeaveJobInQueue = true
StreamOut = false
CommittedSlotTime = 3.000000
DiskUsage = 325
AutoClusterAttrs = "JobUniverse,LastCheckpointPlatform,NumCkpts,jordan,Requirements,NiceUser,ConcurrencyLimits"
ClusterId = 669
CommittedTime = 3
CompletionDate = 1364322130
SpooledOutputFiles = "render_0.png"
StartdPrincipal = "unauthenticated@unmapped/10.194.169.234"
JobCurrentStartTransferOutputDate = 1364322130
TransferInput = "<url>"
CumulativeSlotTime = 3.000000
MyType = "Job"
JobRunCount = 1
LastRemoteHost = "<machine-ip>"
StreamErr = false
ResidentSetSize = 0
ProcId = 23
User = "ubuntu@<machine-ip>"
ExitBySignal = false
Arguments = "-b dolphin.blend -o //render_# -F PNG -x 1 -f $(Process)"
ResidentSetSize_RAW = 0
LastSuspensionTime = 0
JobStatus = 4
NumShadowStarts = 1
OrigMaxHosts = 1
MachineAttrSlotWeight0 = 1
ImageSize_RAW = 14260
ShouldTransferFiles = "YES"
QDate = 1364321914
TerminationPending = true

On Tue, Mar 26, 2013 at 2:19 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
Hi Jordan,

What do the ClassAds you are submitting look like?

Iwd should refer to a directory on the submit machine (or the spool directory, if you are using spooling).  By default, Iwd is set to the $PWD of the submitting process.

Brian

On Mar 26, 2013, at 1:09 PM, Jordan Williamson <jordan.williamson@xxxxxxxxxxx> wrote:

I'm trying to run some jobs using the python bindings for Condor 7.9.4. They keep being held because the "Iwd" classad seems to be required, but I can't find a general "default" value for it that would work on any execute machine (that is, if I set it to some hard-coded directory, it would error out on a machine that didn't have that exact directory structure).

Is it possible to leave this classad out and let the execute nodes take care of it? (If so, I can't seem to find any classads that would enable this, and just leaving it out altogether produces errors) Is there a default value for Iwd that would enable this action? I've tried "/", "." and the directory it's being submitted from on the submit machine, but none of those worked.
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/