[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Default value for "Iwd" classad? (Python-Condor)



Hi Jordan,

This may do the trick:

ShouldTransferFiles = "NO"

I'm not horribly familiar with turning off file transfer, so the HTCondor manual may be of more help.

Brian

On Mar 26, 2013, at 2:51 PM, Jordan Williamson <jordan.williamson@xxxxxxxxxxx> wrote:

> Ah, that makes more sense. I actually am using a custom file-transfer plugin to upload the output files to a different server than the submit machine, and thus don't need the files to be transferred to the submit machine after execution.
> 
> How would I prevent Condor from trying to send the files back to the submit node?
> 
> On Tue, Mar 26, 2013 at 3:39 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
> Hi Jordan,
> 
> Iwd refers to a directory on the submit machine.  If HTCondor is transferring your files between submit and execute nodes, what directory would you like it to use on the submit side.
> 
> The file transfer is performed as the submitting user.  So, if you submit as user "ubuntu", "/home/ubuntu/" is a fine place for HTCondor to return the output files to.
> 
> Typically, if you run "condor_submit", Iwd is set to the directory where you invoked the condor_submit from.
> 
> Brian
> 
> On Mar 26, 2013, at 2:03 PM, Jordan Williamson <jordan.williamson@xxxxxxxxxxx> wrote:
> 
>> Oh shoot, those are the classads for a job that ran fine (I temporarily set the Iwd to "/home/ubuntu", as I knew that existed).
>> 
>> Classads for failing job:
>> 
>> ImageSize = 1
>> LeaveJobInQueue = true
>> JobNotification = 2
>> TransferExecutable = false
>> StreamIn = false
>> AutoClusterId = 1
>> StreamErr = false
>> ShouldTransferFiles = "YES"
>> OnExitRemove = true
>> JobStatus = 1
>> LastJobStatus = 0
>> Owner = "ubuntu"
>> MyType = "Job"
>> Cmd = "/usr/bin/blender"
>> WhenToTransferOutput = "ON_EXIT"
>> GlobalJobId = "<machine-ip>#670.22#1364323301"
>> PeriodicRemove = false
>> ImageSize_RAW = 1
>> User = "ubuntu@<machine-ip>"
>> CurrentTime = time()
>> PeriodicHold = false
>> RootDir = "/"
>> Iwd = "/"
>> OnExitHold = false
>> AutoClusterAttrs = "JobUniverse,LastCheckpointPlatform,NumCkpts,jordan,Requirements,NiceUser,ConcurrencyLimits"
>> QDate = 1364323304
>> ClusterId = 670
>> PeriodicRelease = false
>> Requirements = OpSys == "LINUX" && Arch == "INTEL"
>> StreamOut = false
>> Arguments = "-b dolphin.blend -o //render_# -F PNG -x 1 -f $(Process)"
>> TargetType = "Machine"
>> TransferInput = "<url>"
>> RemoteUserCpu = 0
>> JobPrio = 0
>> JobUniverse = 5
>> ProcId = 22
>> ServerTime = 1364324445
>> 
>> Hold error:
>> 
>> Error from <execute-node>: STARTER at <execute-node> failed to send file(s) to <execute-node>; SHADOW at <execute-node> failed to write to file //_condor_stdout: (errno 13) Permission denied
>> 
>> Here, I tried using "/" as the Iwd. If I used something like "/etc", the error would say "failed to write to file /etc/_condor_stdout", etc.
>> 
>> On Tue, Mar 26, 2013 at 2:34 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
>> Hi Jordan,
>> 
>> Looks like things are running right now.  What is the hold message you eventually receive?
>> 
>> FWIW - it would also be interesting to see the ClassAd you give to the Schedd object for submission.
>> 
>> Brian
>> 
>> On Mar 26, 2013, at 1:29 PM, Jordan Williamson <jordan.williamson@xxxxxxxxxxx> wrote:
>> 
>>> Classads:
>>> 
>>> DiskUsage_RAW = 319
>>> Requirements = OpSys == "LINUX" && Arch == "INTEL"
>>> RemoteUserCpu = 0.0
>>> JobFinishedHookDone = 1364322130
>>> OnExitHold = false
>>> GlobalJobId = "<machine-ip>#669.23#1364321911"
>>> NumJobStarts = 1
>>> ExitCode = 0
>>> StreamIn = false
>>> ImageSize = 15000
>>> CurrentTime = time()
>>> JobStartDate = 1364322127
>>> CurrentHosts = 0
>>> JobCurrentStartDate = 1364322127
>>> TargetType = "Machine"
>>> ServerTime = 1364322453
>>> LastPublicClaimId = "<machine-ip>#1364246102#73#..."
>>> Cmd = "/usr/bin/blender"
>>> OnExitRemove = true
>>> TransferExecutable = false
>>> JobUniverse = 5
>>> BytesRecvd = 74.000000
>>> RemoteWallClockTime = 3.000000
>>> JobNotification = 2
>>> Iwd = "/home/ubuntu"
>>> RemoteSysCpu = 0.0
>>> MachineAttrCpus0 = 1
>>> Owner = "ubuntu"
>>> LastJobStatus = 2
>>> MemoryUsage = ( ( ResidentSetSize + 1023 ) / 1024 )
>>> WhenToTransferOutput = "ON_EXIT"
>>> EnteredCurrentStatus = 1364322130
>>> LastJobLeaseRenewal = 1364322130
>>> PeriodicHold = false
>>> AutoClusterId = 1
>>> JobCurrentStartExecutingDate = 1364322129
>>> BytesSent = 24849.000000
>>> JobPrio = 0
>>> RootDir = "/"
>>> PeriodicRelease = false
>>> NumJobMatches = 1
>>> LastMatchTime = 1364322127
>>> PeriodicRemove = false
>>> LeaveJobInQueue = true
>>> StreamOut = false
>>> CommittedSlotTime = 3.000000
>>> DiskUsage = 325
>>> AutoClusterAttrs = "JobUniverse,LastCheckpointPlatform,NumCkpts,jordan,Requirements,NiceUser,ConcurrencyLimits"
>>> ClusterId = 669
>>> CommittedTime = 3
>>> CompletionDate = 1364322130
>>> SpooledOutputFiles = "render_0.png"
>>> StartdPrincipal = "unauthenticated@unmapped/10.194.169.234"
>>> JobCurrentStartTransferOutputDate = 1364322130
>>> TransferInput = "<url>"
>>> CumulativeSlotTime = 3.000000
>>> MyType = "Job"
>>> JobRunCount = 1
>>> LastRemoteHost = "<machine-ip>"
>>> StreamErr = false
>>> ResidentSetSize = 0
>>> ProcId = 23
>>> User = "ubuntu@<machine-ip>"
>>> ExitBySignal = false
>>> Arguments = "-b dolphin.blend -o //render_# -F PNG -x 1 -f $(Process)"
>>> ResidentSetSize_RAW = 0
>>> LastSuspensionTime = 0
>>> JobStatus = 4
>>> NumShadowStarts = 1
>>> OrigMaxHosts = 1
>>> MachineAttrSlotWeight0 = 1
>>> ImageSize_RAW = 14260
>>> ShouldTransferFiles = "YES"
>>> QDate = 1364321914
>>> TerminationPending = true
>>> 
>>> On Tue, Mar 26, 2013 at 2:19 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
>>> Hi Jordan,
>>> 
>>> What do the ClassAds you are submitting look like?
>>> 
>>> Iwd should refer to a directory on the submit machine (or the spool directory, if you are using spooling).  By default, Iwd is set to the $PWD of the submitting process.
>>> 
>>> Brian
>>> 
>>> On Mar 26, 2013, at 1:09 PM, Jordan Williamson <jordan.williamson@xxxxxxxxxxx> wrote:
>>> 
>>>> I'm trying to run some jobs using the python bindings for Condor 7.9.4. They keep being held because the "Iwd" classad seems to be required, but I can't find a general "default" value for it that would work on any execute machine (that is, if I set it to some hard-coded directory, it would error out on a machine that didn't have that exact directory structure).
>>>> 
>>>> Is it possible to leave this classad out and let the execute nodes take care of it? (If so, I can't seem to find any classads that would enable this, and just leaving it out altogether produces errors) Is there a default value for Iwd that would enable this action? I've tried "/", "." and the directory it's being submitted from on the submit machine, but none of those worked.
>>>> _______________________________________________
>>>> HTCondor-users mailing list
>>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>> 
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>> 
>>> 
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>> 
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>> 
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>> 
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>> 
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature