[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Default value for "Iwd" classad? (Python-Condor)



That seemed to have fixed that problem! The jobs are now running to completion. Not sure if they are completing sucessfully or not, but that's another story.

Thanks, Brian!

On Tue, Mar 26, 2013 at 3:56 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
Hi Jordan,

This may do the trick:

ShouldTransferFiles = "NO"

I'm not horribly familiar with turning off file transfer, so the HTCondor manual may be of more help.

Brian

On Mar 26, 2013, at 2:51 PM, Jordan Williamson <jordan.williamson@xxxxxxxxxxx> wrote:

> Ah, that makes more sense. I actually am using a custom file-transfer plugin to upload the output files to a different server than the submit machine, and thus don't need the files to be transferred to the submit machine after execution.
>
> How would I prevent Condor from trying to send the files back to the submit node?
>
> On Tue, Mar 26, 2013 at 3:39 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
> Hi Jordan,
>
> Iwd refers to a directory on the submit machine.  If HTCondor is transferring your files between submit and execute nodes, what directory would you like it to use on the submit side.
>
> The file transfer is performed as the submitting user.  So, if you submit as user "ubuntu", "/home/ubuntu/" is a fine place for HTCondor to return the output files to.
>
> Typically, if you run "condor_submit", Iwd is set to the directory where you invoked the condor_submit from.
>
> Brian
>
> On Mar 26, 2013, at 2:03 PM, Jordan Williamson <jordan.williamson@xxxxxxxxxxx> wrote:
>
>> Oh shoot, those are the classads for a job that ran fine (I temporarily set the Iwd to "/home/ubuntu", as I knew that existed).
>>
>> Classads for failing job:
>>
>> ImageSize = 1
>> LeaveJobInQueue = true
>> JobNotification = 2
>> TransferExecutable = false
>> StreamIn = false
>> AutoClusterId = 1
>> StreamErr = false
>> ShouldTransferFiles = "YES"
>> > >> JobStatus = 1
>> LastJobStatus = 0
>> Owner = "ubuntu"
>> MyType = "Job"
>> Cmd = "/usr/bin/blender"
>> WhenToTransferOutput = "ON_EXIT"
>> GlobalJobId = "<machine-ip>#670.22#1364323301"
>> PeriodicRemove = false
>> ImageSize_RAW = 1
>> User = "ubuntu@<machine-ip>"
>> CurrentTime = time()
>> PeriodicHold = false
>> RootDir = "/"
>> Iwd = "/"
>> > >> AutoClusterAttrs = "JobUniverse,LastCheckpointPlatform,NumCkpts,jordan,Requirements,NiceUser,ConcurrencyLimits"
>> QDate = 1364323304
>> ClusterId = 670
>> PeriodicRelease = false
>> Requirements = OpSys == "LINUX" && Arch == "INTEL"
>> StreamOut = false
>> Arguments = "-b dolphin.blend -o //render_# -F PNG -x 1 -f $(Process)"
>> TargetType = "Machine"
>> TransferInput = "<url>"
>> RemoteUserCpu = 0
>> JobPrio = 0
>> JobUniverse = 5
>> ProcId = 22
>> ServerTime = 1364324445
>>
>> Hold error:
>>
>> Error from <execute-node>: STARTER at <execute-node> failed to send file(s) to <execute-node>; SHADOW at <execute-node> failed to write to file //_condor_stdout: (errno 13) Permission denied
>>
>> Here, I tried using "/" as the Iwd. If I used something like "/etc", the error would say "failed to write to file /etc/_condor_stdout", etc.
>>
>> On Tue, Mar 26, 2013 at 2:34 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
>> Hi Jordan,
>>
>> Looks like things are running right now.  What is the hold message you eventually receive?
>>
>> FWIW - it would also be interesting to see the ClassAd you give to the Schedd object for submission.
>>
>> Brian
>>
>> On Mar 26, 2013, at 1:29 PM, Jordan Williamson <jordan.williamson@xxxxxxxxxxx> wrote:
>>
>>> Classads:
>>>
>>> DiskUsage_RAW = 319
>>> Requirements = OpSys == "LINUX" && Arch == "INTEL"
>>> RemoteUserCpu = 0.0
>>> JobFinishedHookDone = 1364322130
>>> > >>> GlobalJobId = "<machine-ip>#669.23#1364321911"
>>> NumJobStarts = 1
>>> ExitCode = 0
>>> StreamIn = false
>>> ImageSize = 15000
>>> CurrentTime = time()
>>> JobStartDate = 1364322127
>>> CurrentHosts = 0
>>> JobCurrentStartDate = 1364322127
>>> TargetType = "Machine"
>>> ServerTime = 1364322453
>>> LastPublicClaimId = "<machine-ip>#1364246102#73#..."
>>> Cmd = "/usr/bin/blender"
>>> > >>> TransferExecutable = false
>>> JobUniverse = 5
>>> BytesRecvd = 74.000000
>>> RemoteWallClockTime = 3.000000
>>> JobNotification = 2
>>> Iwd = "/home/ubuntu"
>>> RemoteSysCpu = 0.0
>>> MachineAttrCpus0 = 1
>>> Owner = "ubuntu"
>>> LastJobStatus = 2
>>> MemoryUsage = ( ( ResidentSetSize + 1023 ) / 1024 )
>>> WhenToTransferOutput = "ON_EXIT"
>>> EnteredCurrentStatus = 1364322130
>>> LastJobLeaseRenewal = 1364322130
>>> PeriodicHold = false
>>> AutoClusterId = 1
>>> JobCurrentStartExecutingDate = 1364322129
>>> BytesSent = 24849.000000
>>> JobPrio = 0
>>> RootDir = "/"
>>> PeriodicRelease = false
>>> NumJobMatches = 1
>>> LastMatchTime = 1364322127
>>> PeriodicRemove = false
>>> LeaveJobInQueue = true
>>> StreamOut = false
>>> CommittedSlotTime = 3.000000
>>> DiskUsage = 325
>>> AutoClusterAttrs = "JobUniverse,LastCheckpointPlatform,NumCkpts,jordan,Requirements,NiceUser,ConcurrencyLimits"
>>> ClusterId = 669
>>> CommittedTime = 3
>>> CompletionDate = 1364322130
>>> SpooledOutputFiles = "render_0.png"
>>> StartdPrincipal = "unauthenticated@unmapped/10.194.169.234"
>>> JobCurrentStartTransferOutputDate = 1364322130
>>> TransferInput = "<url>"
>>> CumulativeSlotTime = 3.000000
>>> MyType = "Job"
>>> JobRunCount = 1
>>> LastRemoteHost = "<machine-ip>"
>>> StreamErr = false
>>> ResidentSetSize = 0
>>> ProcId = 23
>>> User = "ubuntu@<machine-ip>"
>>> ExitBySignal = false
>>> Arguments = "-b dolphin.blend -o //render_# -F PNG -x 1 -f $(Process)"
>>> ResidentSetSize_RAW = 0
>>> LastSuspensionTime = 0
>>> JobStatus = 4
>>> NumShadowStarts = 1
>>> OrigMaxHosts = 1
>>> MachineAttrSlotWeight0 = 1
>>> ImageSize_RAW = 14260
>>> ShouldTransferFiles = "YES"
>>> QDate = 1364321914
>>> TerminationPending = true
>>>
>>> On Tue, Mar 26, 2013 at 2:19 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
>>> Hi Jordan,
>>>
>>> What do the ClassAds you are submitting look like?
>>>
>>> Iwd should refer to a directory on the submit machine (or the spool directory, if you are using spooling).  By default, Iwd is set to the $PWD of the submitting process.
>>>
>>> Brian
>>>
>>> On Mar 26, 2013, at 1:09 PM, Jordan Williamson <jordan.williamson@xxxxxxxxxxx> wrote:
>>>
>>>> I'm trying to run some jobs using the python bindings for Condor 7.9.4. They keep being held because the "Iwd" classad seems to be required, but I can't find a general "default" value for it that would work on any execute machine (that is, if I set it to some hard-coded directory, it would error out on a machine that didn't have that exact directory structure).
>>>>
>>>> Is it possible to leave this classad out and let the execute nodes take care of it? (If so, I can't seem to find any classads that would enable this, and just leaving it out altogether produces errors) Is there a default value for Iwd that would enable this action? I've tried "/", "." and the directory it's being submitted from on the submit machine, but none of those worked.
>>>> _______________________________________________
>>>> HTCondor-users mailing list
>>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>>
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>>
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/