[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_trsnafer_data makes schedd unresposive



On 5/15/2012 1:17 AM, Felix Wolfheimer wrote:
I'm using a remote submit to a pool (condor_submit -remote) and, thus,
I need condor_transfer_data in order to get the results back. While
condor_transfer_data is transfering my files from the remote schedd I
cannot communicate with the schedd, e.g. the commands condor_q and
condor_submit fails. So the data transfer seems to block the
communication to the schedd. Is this a known issue? Is there any
workaround to make the schedd responsive again?

Hi Felix -

Is your schedd running on a Windows machine? If so, then it is a known issue.

By default, data transfer to/from the schedd should be non-blocking (takes place in a forked child process) on all platforms except Windows; on Windows the default is blocking transfers.

In the Windows Win32 API, there is no fork() system call. So on Windows the schedd can either operate single threaded and thus block any other pending communication to the schedd, or it can perform transfers in a non-blocking background thread. There is an undocumented (shh!) config knob FAKE_CREATE_THREAD to control this behavior; if it is set equal to True (default on Windows) then everything is single threaded, if False (default on Unix) then things run multi-threaded. You could try putting the following into the condor_config on your schedd machine:
   SCHEDD.FAKE_CREATE_THREAD = False
Don't forget to do a condor_reconfig afterwards. This will likely solve your problem, but it may also make your schedd unstable. :(. Unfortunately, I think the default of FAKE_CREATE_THREAD=True on Windows is that way for a reason, e.g. problems were occasionally observed, so the default is slow but stable behavior. It is possible that your work-flow/configuration does not step on any landmines, so if I were you I'd probably give it a quick try and see what happens (I'd be curious to know how it goes).

regards,
Todd