[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] How to find starter process which got struck and not responding to Startd



Hi.

    In our pool we are facing this issue intermittently. we are running VM Jobs. And this always happen when the Starter process trying to get the status of a VM.

    The Starter process will struck or hang without any log update futher and no updates will be sent to STARTD, so it keeps last updated status. After some time the corresponding job in queue will goes to idle state. And trying to match another machine to execute.

The VM job is inconsistent state for some time if it was actually powered off by the user from inside. The VM job state is running.
  1. Is there any way to find those kind of STARTER process which is not updating the STARTD.
  2. I am polling VM status for every 2 minutes, how can I configure STARTD so that it show kill the STARTER process if it not responding will proper data after max 5 minutes.
  3. How to force the job to match the same machine in that case when job went into idle state and try to match new machine. 

Thanks,
Johnson.   


Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com