[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Cannot start A VM



First, thanks for helping

I have switched my configuration in order to use the correct libvirt; now I am using Fedora 11 with libvirt 0.6.2 and Condor 7.5.2, but VMs do not start again.
The error is the same, and logs continue to be not really useful.

SHADOWLOG

06/29/10 10:53:30 Using config source: /opt/condor-7.5.2/etc/condor_config
06/29/10 10:53:30 Using local config sources:
06/29/10 10:53:30    /opt/condor-7.5.2/local.Black/condor_config.local
06/29/10 10:53:30 DaemonCore: command socket at <192.168.1.37:53307>
06/29/10 10:53:30 Initializing a VM shadow for job 1.0
06/29/10 10:53:30 (1.0) (5721): Request to run on slot1@Black <192.168.1.37:54193> was ACCEPTED
06/29/10 10:53:39 (1.0) (5721): ERROR "Error from slot1@Black: Failed to create a new VM" at line 655 in file pseudo_ops.cpp
06/29/10 10:53:39 ******************************************************
06/29/10 10:53:39 ** condor_shadow (CONDOR_SHADOW) STARTING UP
06/29/10 10:53:39 ** /opt/condor-7.5.2/sbin/condor_shadow
06/29/10 10:53:39 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1)
06/29/10 10:53:39 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
06/29/10 10:53:39 ** $CondorVersion: 7.5.2 Apr 20 2010 BuildID: 232940 $
06/29/10 10:53:39 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
06/29/10 10:53:39 ** PID = 5737
06/29/10 10:53:39 ** Log last touched 6/29 10:53:39
06/29/10 10:53:39 ******************************************************
06/29/10 10:53:39 Using config source: /opt/condor-7.5.2/etc/condor_config
06/29/10 10:53:39 Using local config sources:
06/29/10 10:53:39    /opt/condor-7.5.2/local.Black/condor_config.local
06/29/10 10:53:39 DaemonCore: command socket at <192.168.1.37:40322>
06/29/10 10:53:39 Initializing a VM shadow for job 1.0
06/29/10 10:53:39 (1.0) (5737): Request to run on slot1@Black <192.168.1.37:54193> was REFUSED
06/29/10 10:53:39 (1.0) (5737): Job 1.0 is being evicted from slot1@Black
06/29/10 10:53:39 (1.0) (5737): logEvictEvent with unknown reason (108), aborting
06/29/10 10:53:39 (1.0) (5737): **** condor_shadow (condor_SHADOW) pid 5737 EXITING WITH STATUS 108

STARTERLOG.SLOT1

06/29/10 10:54:30 ******************************************************
06/29/10 10:54:30 ** condor_starter (CONDOR_STARTER) STARTING UP
06/29/10 10:54:30 ** /opt/condor-7.5.2/sbin/condor_starter
06/29/10 10:54:30 ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1)
06/29/10 10:54:30 ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
06/29/10 10:54:30 ** $CondorVersion: 7.5.2 Apr 20 2010 BuildID: 232940 $
06/29/10 10:54:30 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
06/29/10 10:54:30 ** PID = 5753
06/29/10 10:54:30 ** Log last touched 6/29 10:53:40
06/29/10 10:54:30 ******************************************************
06/29/10 10:54:30 Using config source: /opt/condor-7.5.2/etc/condor_config
06/29/10 10:54:30 Using local config sources:
06/29/10 10:54:30    /opt/condor-7.5.2/local.Black/condor_config.local
06/29/10 10:54:30 DaemonCore: command socket at <192.168.1.37:57678>
06/29/10 10:54:30 Done setting resource limits
06/29/10 10:54:30 Communicating with shadow <192.168.1.37:41827>
06/29/10 10:54:30 Submitting machine is "Black"
06/29/10 10:54:30 setting the orig job name in starter
06/29/10 10:54:30 setting the orig job iwd in starter
06/29/10 10:54:30 File transfer completed successfully.
06/29/10 10:54:31 Job 1.0 set to execute immediately
06/29/10 10:54:31 Starting a VM universe job with ID: 1.0
06/29/10 10:54:31 About to start new VM
06/29/10 10:54:32 About to exec /opt/condor-7.5.2/sbin/condor_vm-gahp -f -M 3
06/29/10 10:54:32 VMGAHP server pid=5756
06/29/10 10:54:39 VMGAHP write line(RESULTS) Error
06/29/10 10:54:39 Failed to create a new VM
06/29/10 10:54:40 VMGAHP write line(QUIT) Error
06/29/10 10:54:41 Failed to start job, exiting
06/29/10 10:54:41 ShutdownFast all jobs.
06/29/10 10:54:41 **** condor_starter (condor_STARTER) pid 5753 EXITING WITH STATUS 0


STARTLOG

06/29/10 10:53:30 slot1: Received match <192.168.1.37:54193>#1277800929#14#...
06/29/10 10:53:30 slot1: State change: match notification protocol successful
06/29/10 10:53:30 slot1: Changing state: Unclaimed -> Matched
06/29/10 10:53:30 slot1: Request accepted.
06/29/10 10:53:30 slot1: Remote owner is condor@Black
06/29/10 10:53:30 slot1: State change: claiming protocol successful
06/29/10 10:53:30 slot1: Changing state: Matched -> Claimed
06/29/10 10:53:30 slot1: Got activate_claim request from shadow (<192.168.1.37:58438>)
06/29/10 10:53:30 slot1: Remote job ID is 1.0
06/29/10 10:53:30 slot1: Got universe "VM" (13) from request classad
06/29/10 10:53:30 slot1: State change: claim-activation protocol successful
06/29/10 10:53:30 slot1: Changing activity: Idle -> Busy
06/29/10 10:53:38 VM-gahp server reported an internal error
06/29/10 10:53:38 VM universe will be tested to check if it is available


MY_VM.LOG

001 (001.000.000) 06/29 10:54:31 Job executing on host: <192.168.1.37:54193>
...
007 (001.000.000) 06/29 10:54:40 Shadow exception!
    Error from slot1@Black: Failed to create a new VM
    0  -  Run Bytes Sent By Job
    41943040  -  Run Bytes Received By Job


VMGAHPLOG

06/29/10 10:54:39 ** condor_vm-gahp (CONDOR_VM_GAHP) STARTING UP
06/29/10 10:54:39 ** /opt/condor-7.5.2/sbin/condor_vm-gahp
06/29/10 10:54:39 ** SubsystemInfo: name=VM_GAHP type=GAHP(9) class=DAEMON(1)
06/29/10 10:54:39 ** Configuration: subsystem:VM_GAHP local:<NONE> class:DAEMON
06/29/10 10:54:39 ** $CondorVersion: 7.5.2 Apr 20 2010 BuildID: 232940 $
06/29/10 10:54:39 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
06/29/10 10:54:39 ** PID = 5757
06/29/10 10:54:39 ** Log last touched 6/29 10:54:37
06/29/10 10:54:39 ******************************************************
06/29/10 10:54:39 Using config source: /opt/condor-7.5.2/etc/condor_config
06/29/10 10:54:39 Using local config sources:
06/29/10 10:54:39    /opt/condor-7.5.2/local.Black/condor_config.local
06/29/10 10:54:39 Running as root.  Enabling specialized core dump routines
06/29/10 10:54:39 Not using shared port because USE_SHARED_PORT=false
06/29/10 10:54:39 DaemonCore: command socket at <192.168.1.37:50771>
06/29/10 10:54:39 Will use UDP to update collector Black <192.168.1.37:9618>
06/29/10 10:54:39 Not using shared port because USE_SHARED_PORT=false
06/29/10 10:54:39 VMGAHP[5757]: VM-GAHP initialized with run-mode 0
06/29/10 10:54:39 VMGAHP[5757]: Initial UID/GUID=0/0, EUID/EGUID=500/500, Condor UID/GID=500,500
06/29/10 10:54:39 VMGAHP[5757]: Initialize Uids: caller=root, job user=condor
06/29/10 10:54:39 **** condor_vm-gahp (condor_VM_GAHP) pid 5757 EXITING WITH STATUS 0

I also try to exec /opt/condor-7.5.2/sbin/condor_vm-gahp -f -M 3, but none happens.
Then, searched in file pseudo_ops.cpp at line 655 in source code, but it seems to me not really connected to my problem.
Please, give me a hint!!!!

Daniele

> Date: Thu, 24 Jun 2010 08:28:31 -0400
> From: matt@xxxxxxxxxx
> To: condor-users@xxxxxxxxxxx
> CC: daniele.fetoni@xxxxxxxxxx
> Subject: Re: [Condor-users] Cannot start A VM
>
> Your hunch is right on. The version of Condor you have is statically linked with libvirt 0.6.2.
>
> https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1021
>
> Best,
>
>
> matt
>
> On 06/24/2010 05:00 AM, Daniele Fetoni wrote:
> > Just one more thing: what libvirt version does condor 7.5.2 uses?
> > 'cause I am using version 0.6.3 and maybe ther could be some issues.
> >
> > ------------------------------------------------------------------------
> > From: daniele.fetoni@xxxxxxxxxx
> > To: condor-users@xxxxxxxxxxx
> > Date: Thu, 24 Jun 2010 10:09:23 +0200
> > Subject: [Condor-users] Cannot start A VM
> >
> > Hi,
> >
> > I am trying to start a VM with condor 7.5.2 using kvm as Hypervysor.
> > After ome troubles making VM universe start, I manage to submit
> > successfully a VM job, but the VM doesn't start.
> > I found this error in StarterLog.slot1
> >
> > ******************************************************
> > 06/24/10 09:40:04 ** condor_starter (CONDOR_STARTER) STARTING UP
> > 06/24/10 09:40:04 ** /opt/condor-7.5.2/sbin/condor_starter
> > 06/24/10 09:40:04 ** SubsystemInfo: name=STARTER type=STARTER(8)
> > class=DAEMON(1)
> > 06/24/10 09:40:04 ** Configuration: subsystem:STARTER local:<NONE>
> > class:DAEMON
> > 06/24/10 09:40:04 ** $CondorVersion: 7.5.2 Apr 20 2010 BuildID: 232940 $
> > 06/24/10 09:40:04 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
> > 06/24/10 09:40:04 ** PID = 5451
> > 06/24/10 09:40:04 ** Log last touched 6/24 09:38:31
> > 06/24/10 09:40:04 ******************************************************
> > 06/24/10 09:40:04 Using config source: /opt/condor-7.5.2/etc/condor_config
> > 06/24/10 09:40:04 Using local config sources:
> > 06/24/10 09:40:04 /opt/condor-7.5.2/local.Black/condor_config.local
> > 06/24/10 09:40:04 DaemonCore: command socket at <192.168.1.37:34071>
> > 06/24/10 09:40:04 Done setting resource limits
> > 06/24/10 09:40:04 Communicating with shadow <192.168.1.37:35928>
> > 06/24/10 09:40:04 Submitting machine is "Black"
> > 06/24/10 09:40:04 setting the orig job name in starter
> > 06/24/10 09:40:04 setting the orig job iwd in starter
> > 06/24/10 09:40:04 File transfer completed successfully.
> > 06/24/10 09:40:05 Job 1.0 set to execute immediately
> > 06/24/10 09:40:05 Starting a VM universe job with ID: 1.0
> > 06/24/10 09:40:05 About to start new VM
> > 06/24/10 09:40:25 About to exec /opt/condor-7.5.2/sbin/condor_vm-gahp -f
> > -M 3
> > 06/24/10 09:40:25 VMGAHP server pid=5466
> > 06/24/10 09:40:32 VMGAHP write line(RESULTS) Error
> > 06/24/10 09:40:32 Failed to create a new VM
> > 06/24/10 09:40:33 VMGAHP write line(QUIT) Error
> > 06/24/10 09:40:34 Failed to start job, exiting
> > 06/24/10 09:40:34 ShutdownFast all jobs.
> > 06/24/10 09:40:34 **** condor_starter (condor_STARTER) pid 5451 EXITING
> > WITH STATUS 0
> >
> > And this is ShadowLog
> >
> > 06/24/10 10:04:29 ******************************************************
> > 06/24/10 10:04:29 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> > 06/24/10 10:04:29 ** /opt/condor-7.5.2/sbin/condor_shadow
> > 06/24/10 10:04:29 ** SubsystemInfo: name=SHADOW type=SHADOW(6)
> > class=DAEMON(1)
> > 06/24/10 10:04:29 ** Configuration: subsystem:SHADOW local:<NONE>
> > class:DAEMON
> > 06/24/10 10:04:29 ** $CondorVersion: 7.5.2 Apr 20 2010 BuildID: 232940 $
> > 06/24/10 10:04:29 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
> > 06/24/10 10:04:29 ** PID = 7144
> > 06/24/10 10:04:29 ** Log last touched 6/24 10:04:29
> > 06/24/10 10:04:29 ******************************************************
> > 06/24/10 10:04:29 Using config source: /opt/condor-7.5.2/etc/condor_config
> > 06/24/10 10:04:29 Using local config sources:
> > 06/24/10 10:04:29 /opt/condor-7.5.2/local.Black/condor_config.local
> > 06/24/10 10:04:29 DaemonCore: command socket at <192.168.1.37:44994>
> > 06/24/10 10:04:29 Initializing a VM shadow for job 2.0
> > 06/24/10 10:04:29 (2.0) (7144): Request to run on slot2@Black
> > <192.168.1.37:57569> was REFUSED
> > 06/24/10 10:04:29 (2.0) (7144): Job 2.0 is being evicted from slot2@Black
> > 06/24/10 10:04:29 (1.0) (7143): **** condor_shadow (condor_SHADOW) pid
> > 7143 EXITING WITH STATUS 108
> > 06/24/10 10:04:29 (2.0) (7144): logEvictEvent with unknown reason (108),
> > aborting
> > 06/24/10 10:04:29 (2.0) (7144): **** condor_shadow (condor_SHADOW) pid
> > 7144 EXITING WITH STATUS 108
> >
> >
> > I cannot unterstand where is the error in VMGAHP; moreover if I run the
> > command /opt/condor-7.5.2/sbin/condor_vm-gahp -f -M 0 vmtype "kvm", it
> > works properly.
> > I can start a VM via qemu-kvm, so there should not be problems with
> > hypervisor.
> >
> > What could be the problem? Any suggestion?
> >
> > Thanks in advance
> >
> > Daniele
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> > Messenger Radio. La radio che si fa cliccare!
> > <http://www.messenger.it/messenger_radio.aspx>
> > ------------------------------------------------------------------------
> > Messenger Radio. La radio che si fa cliccare!
> > <http://www.messenger.it/messenger_radio.aspx>
> >
> >
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
>


Importa i tuoi contatti di Facebook. Chiacchiera su Messenger!