[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Error from starter on host: Internal vmgahp server error (with change in Q)



Hi Johnson,

In order to look into your problem, I need vmgahp log file whose path
should be specified in condor_config with the macros of VM_GAHP_LOG.

Regards,
-jaeyoung

On Wed, Jul 23, 2008 at 4:00 AM, Johnson koil Raj <johnson.raj@xxxxxxxxx> wrote:
> Hi,
>
> we have installed Centos 5.2 in our condor Pool and Condor version was
> Condor 7.0.3 (Central Manager has Fedora 5)
>
> Before that We Have Centos 5.1 in our Pool.In that VM starting, Suspending,
> etc that all worked fine.
>
> Now after upgrade the following Issue In the log file while Starting the VM.
> I am using same VM image only.
>
> IN StarterLog file of Executor
> 7/23 10:33:14 About to start new VM
> 7/23 10:33:14 Will send the part of job ClassAd to vmgahp
> 7/23 10:33:14 About to exec /opt/condor-7.0.3/sbin/condor_vm-gahp -f -M 1
> 7/23 10:33:14 Env = VMGAHP_WORKING_DIR=/vm/local.grid7/execute/dir_22472
> VMGAHP_USER_GID=49527 _CONDOR_SLOT=1 CONDOR_IDS=49527.49527
> VMGAHP_VMTYPE=vmware VMGAHP_USER_UID=49527
> _CONDOR_SCRATCH_DIR=/vm/local.grid7/execute/dir_22472
> VMGAHP_CONFIG=/opt/condor-7.0.3/etc/condor_vmgahp_config.vmware
> 7/23 10:33:14 Create_Process: using fast clone() to create child process.
> 7/23 10:33:14 VMGAHP server pid=22475
> 7/23 10:33:14 Failed to read vmgahp server version
> 7/23 10:33:14 Inside VM_GAHP_SERVER::cleanup()
> 7/23 10:33:14 VMGAHP write line(QUIT) Error
> 7/23 10:33:14 End of VM_GAHP_SERVER::cleanup
> 7/23 10:33:15 Failed to start vm-gahp server
> 7/23 10:33:16 Inside VMProc::cleanup()
> 7/23 10:33:16 Failed to start job, exiting
> 7/23 10:33:16 ShutdownFast all jobs.
> 7/23 10:33:16 Got ShutdownFast when no jobs running.
> 7/23 10:33:16 Removing /vm/local.grid7/execute/dir_22472
> 7/23 10:33:16 Attempting to remove /vm/local.grid7/execute/dir_22472 as
> SuperUser (root)
>
> IN StartLog file of Executor
> 7/23 14:34:48 get_file(): going to write to filename
> /vm/local.grid7/execute/dir_24214/centos.vmdk
> 7/23 14:34:48 get_file: Receiving 494731264 bytes
> 7/23 14:35:17 Got SIGTERM. Performing graceful shutdown.
>
> In ShadowLog file of Submitter
> 7/23 10:32:31 (15.0) (26031): ReliSock::put_file_with_permissions(): going
> to send permissions 100600
> 7/23 10:32:31 (15.0) (26031): put_file: going to send from filename
> /home/idealgrid/Emailcentos/centos.vmdk
> 7/23 10:32:31 (15.0) (26031): put_file: Found file size 494731264
> 7/23 10:32:31 (15.0) (26031): put_file: sending 494731264 bytes
> 7/23 10:32:58 (13.0) (25379): Getting monitoring info for pid 25379
> 7/23 10:33:13 (15.0) (26031): ReliSock: put_file: sent 494731264 bytes
> 7/23 10:33:13 (15.0) (26031): DoUpload: exiting at 2357
> 7/23 10:33:14 (15.0) (26031): Resource grid7.pesgrid.wipro.com changing
> state from STARTUP to EXECUTING
> 7/23 10:33:14 (15.0) (26031): scheddname = scorpio.pesgrid.wipro.com
> 7/23 10:33:14 (15.0) (26031): executeHost = <10.201.42.237:45684>
> 7/23 10:33:14 (15.0) (26031): start = <10.201.42.237:45684>
> 7/23 10:33:14 (15.0) (26031): end = :45684>
> 7/23 10:33:14 (15.0) (26031): tmpaddr = 10.201.42.237
> 7/23 10:33:14 (15.0) (26031): Executehost name = grid7.pesgrid.wipro.com
> (hp->h_name)
> 7/23 10:33:14 (15.0) (26031): Started timer to evaluate periodic user policy
> expressions every 60 seconds
> 7/23 10:33:14 (15.0) (26031): QmgrJobUpdater: started timer to update queue
> every 900 seconds (tid=10)
> 7/23 10:33:14 (15.0) (26031): Set NumJobStarts to 1
> 7/23 10:33:16 (15.0) (26031): ERROR "Error from starter on
> grid7.pesgrid.wipro.com: Internal vmgahp server error" at line 649 in file
> pseudo_ops.C
>
> In UserLog file
> 001 (015.000.000) 07/23 10:33:14 Job executing on host:
> <10.201.42.237:45684>
> ...
> 007 (015.000.000) 07/23 10:33:16 Shadow exception!
>         Error from starter on grid7.pesgrid.wipro.com: Internal vmgahp
> server error
>         0  -  Run Bytes Sent By Job
>         494734752  -  Run Bytes Received By Job
>
>
> Error from starter on grid7.pesgrid.wipro.com: Internal vmgahp server error
> 7/23 10:33:14 Failed to read vmgahp server version
> 7/23 10:33:14 Inside VM_GAHP_SERVER::cleanup()
> 7/23 10:33:14 VMGAHP write line(QUIT) Error
> 7/23 10:33:14 End of VM_GAHP_SERVER::cleanup
> 7/23 10:33:15 Failed to start vm-gahp server
> 7/23 10:33:16 Inside VMProc::cleanup()
> 7/23 10:33:16 Failed to start job, exiting
>
>
> by
> Johnson
>
>
>
> Please do not print this email unless it is absolutely necessary.
>
> The information contained in this electronic message and any attachments to
> this message are intended for the exclusive use of the addressee(s) and may
> contain proprietary, confidential or privileged information. If you are not
> the intended recipient, you should not disseminate, distribute or copy this
> e-mail. Please notify the sender immediately and destroy all copies of this
> message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient should
> check this email and any attachments for the presence of viruses. The
> company accepts no liability for any damage caused by any virus transmitted
> by this email.
>
> www.wipro.com
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>