Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] log files on lustre

Date: Thu, 28 Apr 2016 10:35:05 -0400
From: Michael Di Domenico <mdidomenico4@xxxxxxxxx>
Subject: [HTCondor-users] log files on lustre

I've run across an odd issue.  if i submit a job with a queue
parameter that is large > 10k, and i set the Log (single file)/Output
(one per procid)/Error (one per procid) parameters in the submit file
to be on a lustre filesystem

the job will run, but it drives the load on the submit machine up over
2000 and the jobs basically sit in Claimed:Idle

if I change only the Log parameter to be on an nfs (netapp) file
system the job submits and runs normally, no high load, no
claimed:idle states

the lustre filesystem is more then large enough to handle the file i/o
load, and it's not currently under any load

has anyone seen this or something like it before?

any thoughts on what condor might be doing differently when writing
the log file on nfs as opposed to lustre?

any recommendations on tracing the system calls to see what condor
might be doing?  strace on the schedd is good, but too much data and
i'm not sure how to whittle it down into anything useful

i'm running condor 8.4.0 on rhel-6.7 x86_64

Prev by Date: [HTCondor-users] Is there a method to nice docker jobs?
Next by Date: Re: [HTCondor-users] Elastically extend local condor pool by EC2 instances
Previous by thread: Re: [HTCondor-users] Is there a method to nice docker jobs?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[HTCondor-users] log files on lustre