[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] condor_shadow "D" state in processes
- Date: Tue, 04 Dec 2007 10:34:58 -0600
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] condor_shadow "D" state in processes
Does the ShadowLog contain any clues about what the shadows are doing
during the time of high load?
If not, it may be enlightening to run 'strace -p <pid of a shadow>' and
see what the shadow is trying to do.
Robert E. Parrott wrote:
I'm seeing an unfortunate behavior with condor_shadow jobs in the
vanilla universe. this is LINUX X86_64 and condor v6.8.6.
A user submits a large number (500 -1000 ) or jobs on a cluster with
150 processors, and has about 100 jobs running simultaneously. These
jobs all run for about 3 minutes, and then complete at nearly the
same time. At this time, the load on the submit machine, which is
also the head node, reaches a little over N, where N is the number of
this user's running jobs.
Closer inspection shows that all of the condor_shadow processes owned
by this user are in the "D" state, contending for what appears to be
the same resources.
At first I thought that this was contention was the output data was
returned from the compute nodes to the submit node. As such I asked
the user to add
initialdir = [ the run dir ]
should_transfer_files = NO
To the submit file, but this doesn't help. Also, looking at the
actual output, each job produces less than 20 K in output data.
What could be causing such contention in a vanilla universe
condor_shadow job, if not the final file transfer process? Has
anyone seen such behavior before in the vanilla universe? Any hints
of guesses for things to look at?
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: