[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] How to find starter process which got struck and not responding to Startd



Hi Tim,

    Running condor-7.2.3 and using vmware.

Thanks
Johnson

On Monday 26 July 2010 08:21 PM, Timothy St. Clair wrote:
I have a couple of questions regarding your questions ;-)

1.) What version of condor are you running?
2.) What type of vm job are you running?  (vmware, lvm, or xen?)

Cheers,
Tim

On Mon, 2010-07-26 at 19:54 +0530, Johnson koil Raj wrote:
Hi.

     In our pool we are facing this issue intermittently. we are
running VM Jobs. And this always happen when the Starter process
trying to get the status of a VM.

     The Starter process will struck or hang without any log update
futher and no updates will be sent to STARTD, so it keeps last updated
status. After some time the corresponding job in queue will goes to
idle state. And trying to match another machine to execute.

The VM job is inconsistent state for some time if it was actually
powered off by the user from inside. The VM job state is running.
      1. Is there any way to find those kind of STARTER process which
         is not updating the STARTD.
      2. I am polling VM status for every 2 minutes, how can I
         configure STARTD so that it show kill the STARTER process if
         it not responding will proper data after max 5 minutes.
      3. How to force the job to match the same machine in that case
         when job went into idle state and try to match new machine.

Thanks,
Johnson.




Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any
attachments to this message are intended for the exclusive use of the
addressee(s) and may contain proprietary, confidential or privileged
information. If you are not the intended recipient, you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately and destroy all copies of this message and any
attachments.

WARNING: Computer viruses can be transmitted via email. The recipient
should check this email and any attachments for the presence of
viruses. The company accepts no liability for any damage caused by any
virus transmitted by this email.

www.wipro.com

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
www.wipro.com