[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Held jobs: unable to establish standard (output|error) stream



On Fri, 2021-01-08 at 12:24:56 +0100, Steffen Grunewald wrote:
> Hi Greg,
> 
> On Thu, 2021-01-07 at 10:14:57 -0600, Greg Thain wrote:
> > 
> > On 1/7/21 7:03 AM, Thomas Hartmann wrote:
> > > 
> > > I guess there is not much more in the Starter log on your execute
> > > node/slot `slot1_3@xxxxxxxxxxxxxxxxxxx` compared to the shadow
> > > 
> > The immediate problem here is that this job is trying to stream standard
> > output back to the submt machine in real time by setting
> > 
> > stream_output = true
> > 
> > in the submit file.  Which is fine. 
> 
> I still suspect that this is the only user trying to stream output when it
> count simply be written to the shared file system (which is BeeGFS, by the way).

After bumping "ulimit -n" to 8192 from its default 1024, this problem has
now reappeared on a different node, but affecting the same user (and only
this one).

I'm running out of submit nodes now.

- S


-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~