[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] How to find starter process which got struck and not responding to Startd



I have a couple of questions regarding your questions ;-) 

1.) What version of condor are you running?
2.) What type of vm job are you running?  (vmware, lvm, or xen?) 

Cheers,
Tim

On Mon, 2010-07-26 at 19:54 +0530, Johnson koil Raj wrote:
> Hi.
> 
>     In our pool we are facing this issue intermittently. we are
> running VM Jobs. And this always happen when the Starter process
> trying to get the status of a VM.
> 
>     The Starter process will struck or hang without any log update
> futher and no updates will be sent to STARTD, so it keeps last updated
> status. After some time the corresponding job in queue will goes to
> idle state. And trying to match another machine to execute. 
> 
> The VM job is inconsistent state for some time if it was actually
> powered off by the user from inside. The VM job state is running.
>      1. Is there any way to find those kind of STARTER process which
>         is not updating the STARTD.
>      2. I am polling VM status for every 2 minutes, how can I
>         configure STARTD so that it show kill the STARTER process if
>         it not responding will proper data after max 5 minutes.
>      3. How to force the job to match the same machine in that case
>         when job went into idle state and try to match new machine. 
> 
> Thanks,
> Johnson.    
> 
> 
> 
> 
> Please do not print this email unless it is absolutely necessary. 
> 
> The information contained in this electronic message and any
> attachments to this message are intended for the exclusive use of the
> addressee(s) and may contain proprietary, confidential or privileged
> information. If you are not the intended recipient, you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and destroy all copies of this message and any
> attachments. 
> 
> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of
> viruses. The company accepts no liability for any damage caused by any
> virus transmitted by this email. 
> 
> www.wipro.com
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/