[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] VMGAHP_ERR_CRITICAL



Hi Johnson,

Let me clarify your problem in your system environment. As you said, you have two machines (Zeus, Pluto) for Condor.

1. First of all, because your job submit description file has 'Requirements   = (Machine == "zeus.pesgrid.wipro.com")', your VM job can be executed on only Zeus.

2.  When you try to submit a VM job from Zeus,  the job should be assigned to Zeus. And I think you should have NO problem.

3.  When you try to submit a VM job from Pluto, the job should also be assigned to Zeus due to your job requirements. And I think you must have VMGAHP problem.

Here are my observations for your case.

In your environment, your Condor daemons on Zeus seem to run as Root with "CONDOR_IDS=daemon,daemon". So ordinary Condor jobs like Vanilla, Standard, JAVA from other machine(Pluto) will run as "Nobody" or "Same UID on submit machine".

In result, because VMware requires that a user starting a Virtual machine have a writable working directory. Your problem happened because the UID=2(daemon) doesn't have a writable working directory as "Nobody" doesn't. Unlike ordinary Condor jobs, VM jobs doens't use "Nobody" when the UID on submit machine doesn't exist on an execute machine. Instead of "Nobody", VM jobs try to use UID of Condor daemon, generally "condor".

With VMGAHP log files you sent, you can look at what happened on your Zeus.

When you submit a VM job from Zeus to Zeus, VMGAPH.Johnson log says that your VM job successfully ran as "UID=Johson".
But when you submit a VM job from pluto to Zeus, VMGAHP.daemon says that your VM jobs tried to run as "UID=daemon" and failed.

So here is solution for you.

If you run Condor as root and you specified CONDOR_IDS=daemon,daemon. Please add the following configuration parameter to Condor configuration file on Zeus.
VM_UNIV_NOBODY_USER = "login name of a user who has home directory"

With above parameter, VM jobs from pluto will use the UID specified in "VM_UNIV_NOBODY_USER".

In Condor manual section 3.3.26, you can see the configuration parameters for VM universe.

If you have questions, please let me know.

Best,

-Jaeyoung


On Mon, Mar 17, 2008 at 8:01 AM, JohnsonKoilraj <johnson.raj@xxxxxxxxx> wrote:
Hi Yoon,

       How are you.
  Here is the scenario.I am having 2 system in my condor pool.
  1.Zeus (Central manager,submitter,executor) Johnson - username

  2.Pluto (Submitter,executor)   condor - username (who submit job)

  Now, I can start in Zeus from Zeus..
  Then when I try to start VM in Pluto from Zeus (no match found).

  Then When I try to start Vm in Zeus From Pluto (the error occurs)

  I am using           -  Condor 7.0.1
  Vmware Server        -  Vmware 1.0.4

 1. I have Attached Job Description files (firstvm.sh)

 2. I have attached VMGAHPLOG.daemon(I think condor updated on that file
   because when i submit job from Pluto(condor) to Zeus)

 3. I have attached VMGAHPLOG.Johnson(while Vm was started in Zeus from
   Zeus(Johnson) this file was updated.)

 4. I have attached log file created by Job description file

Thank you for your response





On Thu, 2008-03-13 at 08:28 -0500, Jaeyoung Yoon wrote:
> HI, johnson.
>
> VMGAHP_ERR_CRITICAL may happen due to several reasons.
> In order to know what happened exactly,
> Could you send me vmgahp log file? The location of vmgahp log is
> defined in Condor configuration file as "VM_GAHP_LOG" parameter. If
> VM_GAHP_LOG is commented, please comment out it.
>
> Here is an example,
>
> VM_GAHP_LOG = /tmp/VMGAHPLog.$(USERNAME)
> VM_GAHP_DEBUG = D_FULLDEBUG
>
> which version of Condor are you using?
> Are you using VMware Server or Xen?
> Could you send me your submit description file for this VM job?
>
> Thanks,
>
> -jaeyoung
>

>

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/