[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] nfs and condor



I have tried a test like this recently and the jobs get killed and re-queued.

I hope someone has a better solution for handling if the submit node
goes down. Its crucial to have this feature, IMO



On Wed, Jun 15, 2011 at 6:46 PM, Rita <rmorgan466@xxxxxxxxx> wrote:
> Lets say I submit 100 jobs from this type of setup and I have
>  stream_output = true
>  stream_error = true
> output=/localfs/$(Cluster).out
> log=/localfs/$(Cluster).log
> error=/localfs/$(Cluster).err
> And the submit box reboots? Would all the jobs get held? If so, is there a
> way to avoid this? Perhaps have a retry (have condor buffer the
> stdout/stderr) and once the submit host comes online it will stream the data
> to it?
>
>
> On Mon, Jun 13, 2011 at 8:12 AM, Mag Gam <magawake@xxxxxxxxx> wrote:
>>
>> Thanks everyone I will give this a try.
>>
>>
>>
>>
>> On Mon, Jun 13, 2011 at 7:43 AM, Todd Tannenbaum <tannenba@xxxxxxxxxxx>
>> wrote:
>> > Mag Gam wrote:
>> >>
>> >> Steve,
>> >> I am very curious about the condor_transfer_files tricks. Ideally,
>> >> write to local drives and then tranfsfer the files back to submit host
>> >> or place them back into NFS.
>> >>
>> >> Most of us see stdout and stderr when we submit jobs and look at their
>> >> progress. Would be still be able to do that with condor_transfer_files
>> >> ?
>> >>
>> >
>> > See the section in the Condor Manual 'Submitting Jobs Without a Shared
>> > File
>> > System', online at
>> >
>> > http://www.cs.wisc.edu/condor/manual/v7.6/2_5Submitting_Job.html#SECTION00354000000000000000
>> >
>> > Along with the condor_submit man page section on file transfer options,
>> > this
>> > should answer all your questions in this area. Unless we missed
>> > something
>> > writing these Manual sections, in which case please let us know. :)
>> >
>> > Re your stdout and stderr question - if you put:
>> >  stream_output = true
>> >  stream_error = true
>> > in your job submit file, then stdout/err will be streamed back from the
>> > execute machine to the submit machine in real-time as the job runs.
>> >
>> > regards,
>> > Todd
>> >
>> > _______________________________________________
>> > Condor-users mailing list
>> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
>> > a
>> > subject: Unsubscribe
>> > You can also unsubscribe by visiting
>> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>> >
>> > The archives can be found at:
>> > https://lists.cs.wisc.edu/archive/condor-users/
>> >
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>