[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor and KVM: cannot connect to qemu:///session



Seeing as user rjansen didn't have permission to read from his home directory in AFS, I changed the permissions to allow access. Now, I can run virsh and connect to qemu:///session as rjansen without AFS credentials. I'm able to list his machines.

Condor, however, gives me the same error. Does a vm universe job user need anything other than an accessible home directory to start a virtual machine under Condor? Any ideas as to what the problem is?

Also, does the condor_vm-gahp run as root, but then switch to the job user to create and start up a virtual machine? Is it possible to make Condor just connect to qemu:///system as root (or even as the user)?

Any insight would be very much appreciated.

Thanks,
Ryan

On Mon, Dec 13, 2010 at 3:48 PM, Ryan Jansen <rjansen@xxxxxx> wrote:
Tim,

I currently have two VMs defined on the host machine, ryankvm01, which is defined under qemu:///system for user rjansen, and ryankvm02, which is defined under qemu:///system for root.

Running as root, I can connect to both URIs and run VMs under each:

# virsh -c qemu:///system list --all
 Id Name                 State
----------------------------------
  - ryankvm02            shut off

virsh -c qemu:///session list --all
 Id Name                 State
----------------------------------
  - ryankvm02            shut off


Logging in as rjansen and running virsh, I can do the same thing (although it shows ryankvm01 under qemu:///session).

However (and I think this may be part of the problem), when I just su into user rjansen from root, I don't have the appropriate AFS tokens to access rjansen's home. As such, libvirt gives me an error:

# su rjansen
# virsh -c qemu:///session list --all
libvir: Network Config error : Failed to open dir '/afs/crc.nd.edu/user/r/rjansen/.libvirt/qemu/networks': Permission denied
Failed to open dir '/afs/crc.nd.edu/user/r/rjansen/.libvirt/storage': Permission deniedlibvir: Domain Config error : Failed to open dir '/afs/crc.nd.edu/user/r/rjansen/.libvirt/qemu': Permission denied
libvir: error : could not connect to qemu:///session
error: could not connect to qemu:///session
error: failed to connect to the hypervisor


Following up on the error with rjansen, I also tried creating a local user with a home directory on the host machine, condor_vm. condor_vm can use virsh, but Condor still gives me the same error when I try to run the job with condor_vm set as the nobody user. condor_vm can connect just fine with a simple su.

# su condor_vm
# virsh -c qemu:///session list --all
 Id Name                 State
----------------------------------

The error with rjansen seems to be on the right track, but I don't understand why the condor_vm user didn't work. Any ideas?

Thanks,
Ryan



On Mon, Dec 13, 2010 at 3:08 PM, Timothy St. Clair <tstclair@xxxxxxxxxx> wrote:
What happens when you try to open via virsh?

On Mon, 2010-12-13 at 13:55 -0500, Ryan Jansen wrote:
> Tim,
>
> That's what I suspected at first, but it looks like the vm-gahp is
> running as root. Here's the vm-gahp log with D_FULLDEBUG on:
>
> 12/13 13:41:37 Running as root.  Enabling specialized core dump
> routines
> 12/13 13:41:37 DaemonCore: Command Socket at <10.32.72.74:9077>
> 12/13 13:41:37 Will use UDP to update collector cclweb00.cse.nd.edu
> <129.74.152.166:9618>
> 12/13 13:41:37 VMGAHP[916]: VM-GAHP initialized with run-mode 3
> 12/13 13:41:37 VMGAHP[916]: Initial UID/GUID=0/0,
> EUID/EGUID=126019/1313, Condor UID/GID=108172,40
> 12/13 13:41:37 VMGAHP[916]: Initialize Uids: caller=root, job
> user=rjansen
> 12/13 13:41:37 VMGAHP[916]: Constructed VMGahp
> 12/13 13:41:37 VMGAHP[916]: Command: COMMANDS
> 12/13 13:41:38 VMGAHP[916]: Command: SUPPORT_VMS
> 12/13 13:41:38 VMGAHP[916]: Execute commands: S xen kvm vmware
> 12/13 13:41:39 VMGAHP[916]: Command: ASYNC_MODE_ON
> 12/13 13:41:40 VMGAHP[916]: Command: CLASSAD
> 12/13 13:41:43 VMGAHP[916]: Command: CONDOR_VM_START
> 12/13 13:41:43 VMGAHP[916]: Constructed VM_Type.
> 12/13 13:41:43 ERROR "Failed to create libvirt connection: could not
> connect to qemu:///session" at line 989 in file xen_type.cpp
>
> Based on the log output, It appears to be running as root, and it
> knows that the job user is rjansen. Does that look normal, or do you
> still think it's most likely a permissions problem? Is there any way
> to get some more useful output from libvirt, maybe explaining why it
> couldn't connect?
>
> Thanks,
> Ryan
>
>
> On Mon, Dec 13, 2010 at 1:11 PM, Timothy St. Clair
> <tstclair@xxxxxxxxxx> wrote:
>         If you can verify that your libvirtd is running & qemu+kvm are
>         installed
>         properly (check via virsh command prompt), then it is likely a
>         permissions issue.  Condor's vm-gahp requires it be started
>         with
>         elevated priv's(~root) in order to communicate with the
>         libvirtd.
>
>         Cheers,
>         Tim
>
>
>         On Mon, 2010-12-13 at 12:01 -0500, Ryan Jansen wrote:
>         > Hi Tim,
>         >
>         > Thanks for the email and sorry for taking so long to get
>         back to you.
>         >
>         > I'm using libvirt version 0.6.3.
>         >
>         > Ryan
>         >
>         > On Wed, Dec 8, 2010 at 11:13 AM, Timothy St. Clair
>         > <tstclair@xxxxxxxxxx> wrote:
>         >         what version of libvirt are you using?
>         >
>         >         Cheers,
>         >         Tim
>         >
>         >
>         >         On Tue, 2010-12-07 at 16:36 -0500, Ryan Jansen
>         wrote:
>         >         > Hi everyone,
>         >         >
>         >         > I'm having a problem getting Condor to start up a
>         KVM
>         >         virtual machine
>         >         > in Condor. I posted an email before, and with
>         advice from a
>         >         few
>         >         > people, I was able to sort out my KVM problems.
>         But now,
>         >         whenever I
>         >         > run a vm universe job, the condor_vm-gahp fails
>         with the
>         >         following
>         >         > error:
>         >         >
>         >         > 12/07 16:18:12 ** condor_vm-gahp (CONDOR_VM_GAHP)
>         STARTING
>         >         UP
>         >         > 12/07 16:18:12
>         >         >
>         >
>         ** /afs/nd.edu/user37/condor/software/versions/amd64-redhat5/condor-7.4.2-dynamic/sbin/condor_vm-gahp
>         >         > 12/07 16:18:12 ** SubsystemInfo: name=VM_GAHP
>         type=GAHP(9)
>         >         > class=DAEMON(1)
>         >         > 12/07 16:18:12 ** Configuration: subsystem:VM_GAHP
>         >         local:<NONE>
>         >         > class:DAEMON
>         >         > 12/07 16:18:12 ** $CondorVersion: 7.4.2 Mar 29
>         2010 BuildID:
>         >         227044 $
>         >         > 12/07 16:18:12 ** $CondorPlatform:
>         X86_64-LINUX_RHEL5 $
>         >         > 12/07 16:18:12 ** PID = 13583
>         >         > 12/07 16:18:12 ** Log last touched 12/7 16:18:10
>         >         > 12/07 16:18:12
>         >
>         ******************************************************
>         >         > 12/07 16:18:12 Using config
>         >         > source: /afs/nd.edu/user37/condor/condor_config
>         >         > 12/07 16:18:12 Using local config sources:
>         >         > 12/07
>         >         > 16:18:12
>         >
>          /afs/nd.edu/user37/condor/software/config/machines/dqcneh100.local
>         >         > 12/07 16:18:12 DaemonCore: Command Socket at
>         >         <10.32.72.74:9118>
>         >         > 12/07 16:18:12 VMGAHP[13583]: VM-GAHP initialized
>         with
>         >         run-mode 3
>         >         > 12/07 16:18:12 VMGAHP[13583]: Initial
>         UID/GUID=0/0,
>         >         > EUID/EGUID=126019/1313, Condor UID/GID=108172,40
>         >         > 12/07 16:18:12 VMGAHP[13583]: Initialize Uids:
>         caller=root,
>         >         job
>         >         > user=rjansen
>         >         > 12/07 16:18:18 ERROR "Failed to create libvirt
>         connection:
>         >         could not
>         >         > connect to qemu:///session" at line 989 in file
>         xen_type.cpp
>         >         >
>         >         > Now, I have adjusted /etc/libvirt/libvirt.conf to
>         allow the
>         >         libvirt
>         >         > group to access the libvirt rw socket, and I added
>         the users
>         >         root,
>         >         > rjansen, and condor to that group.
>         >         >
>         >         > Additionally, I can connect just fine (as root and
>         rjansen)
>         >         to
>         >         > qemu:///session, through virsh, and through the
>         libvirt C
>         >         library
>         >         > using example code from the qemu website. In fact,
>         the code
>         >         I use to
>         >         > connect to the library in the example program is
>         essentially
>         >         the same
>         >         > as the code on line 989 in xen_type.cpp, which is
>         failing.
>         >         >
>         >         > I'm not sure if I'm doing something wrong with
>         Condor or
>         >         something
>         >         > wrong with KVM/libvirt, but I'd like to get this
>         working.
>         >         >
>         >         > Does anyone have any ideas on how to fix this
>         problem?
>         >         >
>         >         > Thanks,
>         >         > Ryan
>         >
>         >         > _______________________________________________
>         >         > Condor-users mailing list
>         >         > To unsubscribe, send a message to
>         >         condor-users-request@xxxxxxxxxxx with a
>         >         > subject: Unsubscribe
>         >         > You can also unsubscribe by visiting
>         >         >
>         https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>         >         >
>         >         > The archives can be found at:
>         >         > https://lists.cs.wisc.edu/archive/condor-users/
>         >
>         >         _______________________________________________
>         >         Condor-users mailing list
>         >         To unsubscribe, send a message to
>         >         condor-users-request@xxxxxxxxxxx with a
>         >         subject: Unsubscribe
>         >         You can also unsubscribe by visiting
>         >
>         https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>         >
>         >         The archives can be found at:
>         >         https://lists.cs.wisc.edu/archive/condor-users/
>         >
>
>
>