[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] nfs and condor



Our NFS is coming form a file server. Single file server. I believe we
use Network Appliances (not sure about the model).

We launch condor jobs thru condor_submit and condor_submit_dag. Rarely
through condor_run


On Sun, Jun 12, 2011 at 10:10 AM, Erik Aronesty <erik@xxxxxxx> wrote:
> 1. What kind of system is your NFS drive?  Do you know if it's a cluster of
> separately addressable computers, or a single storage array?  The waiting is
> not an issue unless it's a cluster.
>
> 2 .How do you launch condor jobs?  Using condor_run or condor_submit?  Lots
> of jobs launched via bash scripts?   Do you have multiple submitters or are
> all jobs launched from a "head node"?
>
>
> On Sun, Jun 12, 2011 at 8:12 AM, Mag Gam <magawake@xxxxxxxxx> wrote:
>>
>> How are you adding the writes after each job?
>>
>> I was wondering if there are any tricks I can do such as writing log
>> files to local filesystems instead of NFS. Can I place, "log",
>> "output", and "error" to local directories?
>>
>>
>>
>> On Sun, Jun 12, 2011 at 6:27 AM, Erik Aronesty <erik@xxxxxxx> wrote:
>> > I've noticed one issue with NFS that may or may not be related.  Some
>> > NFS
>> > systems are "clusters" of storage nodes that are striped across the set
>> > of
>> > machines.   In these NFS systems, a client connects to one node, and the
>> > information may propagate to other nodes after some time lag.
>> >
>> > If you're using condor, this means that your NFS mounts on the submitter
>> > and
>> > execution nodes may be out of sync...since each mount may be to a
>> > different
>> > physical storage cluster node.
>> >
>> > Adding small delays after writing files that both the execution and
>> > submitter node need to see fixed the problem for us.
>> >
>> > On Sun, Jun 12, 2011 at 6:15 AM, Mag Gam <magawake@xxxxxxxxx> wrote:
>> >>
>> >> At our university we are a heavy NFS user. When we run run long jobs
>> >> with condor and there is a performance problem with our home
>> >> directories (which on are NFS). It seems the job gets requeued.
>> >>
>> >> I was wondering if anyone else out there have a similar problem and
>> >> what they did to fix it :-)
>> >> _______________________________________________
>> >> Condor-users mailing list
>> >> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
>> >> a
>> >> subject: Unsubscribe
>> >> You can also unsubscribe by visiting
>> >> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>> >>
>> >> The archives can be found at:
>> >> https://lists.cs.wisc.edu/archive/condor-users/
>> >
>> >
>> > _______________________________________________
>> > Condor-users mailing list
>> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
>> > a
>> > subject: Unsubscribe
>> > You can also unsubscribe by visiting
>> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>> >
>> > The archives can be found at:
>> > https://lists.cs.wisc.edu/archive/condor-users/
>> >
>> >
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>