[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Xen



When I was doing a 'condor_status -vm' I was only seeing one of my vm
servers.  I found some config issues and this is now working.  Thanks for
your attention while I work through this stuff.

Rgs,
craig


On 1/27/09 6:02 AM, "Matthew Farrellee" <matt@xxxxxxxxxx> wrote:

> What do you mean by "condor servers seeing each other as vm universe hosts"?
> 
> Best,
> 
> 
> matt
> 
> Craig Holland wrote:
>> Jaeyoung,
>> 
>> Thanks, the problem was indeed permissions on my disk image.  I have been
>> able to submit my xen job and have it deployed into the vm universe.  Still
>> having problems with the condor servers seeing each other as vm universe
>> hosts...will look into that today.
>> 
>> 
>> Thanks,
>> craig
>> 
>> 
>> On 1/25/09 2:44 AM, "Jaeyoung Yoon" <jaeyoungyoon@xxxxxxxxx> wrote:
>> 
>>> Hello Craig,
>>> 
>>> I think you need to check whether /var/lib/xen/images/test2-disk0
>>> exists on an execute machine. Otherwise, you need to specify
>>> "xen_transfer_files = /var/lib/xen/images/test2-disk0" in your submit
>>> file to transfer the disk file.
>>> 
>>> Please refer to section "2.11.1.2 Xen-Specific Submit Commands" in
>>> Condor manual.
>>> "If any files need to be transferred from the submit machine to the
>>> machine where the vm universe job will execute, Condor must be
>>> explicitly told to do so with the xen_transfer_files command: "
>>> 
>>> You also can see what is problem from condor vm-gahp log file in
>>> Condor log directory.
>>> 
>>> Regards,
>>> -jaeyoung
>>> 
>>> 
>>> On Fri, Jan 23, 2009 at 2:43 PM, Craig Holland <crhollan@xxxxxxxxx> wrote:
>>>> Nevermind on that....I see this in my logs:
>>>> 
>>>> 012 (014.000.000) 08/13 18:34:20 Job was held.
>>>>        Error from starter on slot1@xxxxxxxxxxxxxxxxxxxxx:
>>>> VMGAHP_ERR_JOBCLASSAD_XEN_INVALID_DISK_PARAM
>>>>        Code 6 Subcode 0
>>>> 
>>>> ....and ideas?
>>>> 
>>>> Thanks,
>>>> craig
>>>> 
>>>> 
>>>> On 1/23/09 2:34 PM, "Craig Holland" <crhollan@xxxxxxxxx> wrote:
>>>> 
>>>>> Thanks Matt.
>>>>> 
>>>>> So, I've gotten a bit further down the road. I'm able to submit the job
>>>>> with
>>>>> the file below but it seems to get held.  I'm thinking there needs to be
>>>>> something that points to the domu config file in /etc/xen....but I don't
>>>>> see
>>>>> any reference to that.  Certainly executing condor_vm_xen.sh from the
>>>>> command
>>>>> line requires the domu control file to be passed in.  I tried using the
>>>>> executable key but that didn't seem to help.
>>>>> 
>>>>> universe        = vm
>>>>> vm_type         = xen
>>>>> vm_memory       = 512
>>>>> vm_networking   = true
>>>>> executable      = test2
>>>>> xen_disk        = /var/lib/xen/images/test2-disk0:xvda:w
>>>>> xen_kernel      = included
>>>>> queue
>>>>> 
>>>>> Thanks,
>>>>> craig
>>>>> 
>>>>> On 1/23/09 1:34 PM, "Matthew Farrellee" <matt@xxxxxxxxxx> wrote:
>>>>> 
>>>>>> When you've configured some machines in your pool to support the VM
>>>>>> Universe you should be able to see them by running: condor_status -vm
>>>>>> 
>>>>>> When you submit a VM Universe job it will be matched with one of those
>>>>>> machines. condor_vm_xen.sh will then be run on the matched machine to
>>>>>> start the VM. condor_vm_xen.sh is just a utility Condor uses to start
>>>>>> the VM, it isn't intended to be used manually.
>>>>>> 
>>>>>> * * *
>>>>>> 
>>>>>> Ugh. condor_vm_xen.sh is in sbin. It shouldn't be. It belongs in libexec.
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> 
>>>>>> matt
>>>>>> 
>>>>>> Craig Holland wrote:
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> So I've been using condor_vm_xen.sh to create the domu.  This just seems
>>>>>>> to
>>>>>>> run it on the local host.  Is this the correct method?  Also, for some
>>>>>>> reason, my condor hosts don't see either other in the vm universe, but
>>>>>>> do
>>>>>>> see each other when I do a condor_status.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> craig
>>>>>>> 
>>>>>>> 
>>>>>>> On 1/23/09 11:16 AM, "Matthew Farrellee" <matt@xxxxxxxxxx> wrote:
>>>>>>> 
>>>>>>>> Craig,
>>>>>>>> 
>>>>>>>> Your vision is pretty accurate.
>>>>>>>> 
>>>>>>>> Essentially, a disk image becomes your job. You submit it, Condor finds
>>>>>>>> a place for it to run. It runs. When it is done, it shuts itself down.
>>>>>>>> 
>>>>>>>> The life cycle for the VM Universe job is the life cycle for the VM. I
>>>>>>>> avoid talking about DomU, because this would apply to KVM VMs as well
>>>>>>>> as
>>>>>>>> EC2 AMIs, if you're using the Grid Universe and EC2 resources.
>>>>>>>> 
>>>>>>>> Some uses: 1) checkpoint & migration without Standard Universe; 2) job
>>>>>>>> portability - the disk contains everything needed for the job; 3)
>>>>>>>> ability to use Condor's policies and robustness to manage services; 4)
>>>>>>>> ability to use glide-in concept across VM clusters
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> matt
>>>>>>>> 
>>>>>>>> Craig Holland wrote:
>>>>>>>>> I think I'm talking about the vm universe.  I'm envisioning sending a
>>>>>>>>> xen
>>>>>>>>> domu into the grid as a job.  I've been able to create the vm
>>>>>>>>> universe,
>>>>>>>>> but
>>>>>>>>> it seems like when a domu is created, it is tied to a specific dom0
>>>>>>>>> (which
>>>>>>>>>>>> I
>>>>>>>>> guess makes sense).  And, once it is created, it isn't really clear to
>>>>>>>>> me
>>>>>>>>> what the benefit of running it in the vm universe is.  BTW: I'm new to
>>>>>>>>> condor ;)
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> craig
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 1/22/09 6:52 PM, "Steven Timm" <timm@xxxxxxxx> wrote:
>>>>>>>>> 
>>>>>>>>>> Your question "the domU actually lives on the grid" isn't
>>>>>>>>>> very well defined as to what you mean by "living on the grid".  Are
>>>>>>>>>> you
>>>>>>>>>> talking about virtual machine universe,
>>>>>>>>>> or just using Xen VM's as compute resources and running normal condor
>>>>>>>>>> jobs?  Both can be done.  We are doing the latter--using Xen VM's as
>>>>>>>>>> regular machines in the condor pool, including for
>>>>>>>>>> collector/negotiator
>>>>>>>>>> and the schedd's.
>>>>>>>>>> 
>>>>>>>>>> Steve Timm
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Thu, 22 Jan 2009, Craig Holland wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> I recently started playing with Xen in Condore.  It isn't clear from
>>>>>>>>>>> the
>>>>>>>>>>> documentation how this works - if the domu actually lives on the
>>>>>>>>>>> grid
>>>>>>>>>>> or
>>>>>>>>>>> if
>>>>>>>>>>> it can use the grid's resources.  It would seem the latter.  Can
>>>>>>>>>>> anyone
>>>>>>>>>>> point me to some useful reading on the subject or fill me in?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> craig
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Condor-users mailing list
>>>>>>>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
>>>>>>>>>>> with
>>>>>>>>>>> a
>>>>>>>>>>> subject: Unsubscribe
>>>>>>>>>>> You can also unsubscribe by visiting
>>>>>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>>>>>>> 
>>>>>>>>>>> The archives can be found at:
>>>>>>>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> ------------------------------------------------------------------
>>>>>>>>>> Steven C. Timm, Ph.D  (630) 840-8525
>>>>>>>>>> timm@xxxxxxxx  http://home.fnal.gov/~timm/
>>>>>>>>>> Fermilab Computing Division, Scientific Computing Facilities,
>>>>>>>>>> Grid Facilities Department, FermiGrid Services Group, Assistant Group
>>>>>>>>>> Leader.
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Condor-users mailing list
>>>>>>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
>>>>>>>>>> with
>>>>>>>>>> a
>>>>>>>>>> subject: Unsubscribe
>>>>>>>>>> You can also unsubscribe by visiting
>>>>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>>>>>> 
>>>>>>>>>> The archives can be found at:
>>>>>>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Craig Holland
>>>>>>>>> Mgr, Operations
>>>>>>>>> Cisco Media Solutions Group
>>>>>>>>> M: +1-650-787-7241
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> Condor-users mailing list
>>>>>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
>>>>>>>>> with
>>>>>>>>> a
>>>>>>>>> subject: Unsubscribe
>>>>>>>>> You can also unsubscribe by visiting
>>>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>>>>> 
>>>>>>>>> The archives can be found at:
>>>>>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>>>>> _______________________________________________
>>>>>>>> Condor-users mailing list
>>>>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
>>>>>>>> a
>>>>>>>> subject: Unsubscribe
>>>>>>>> You can also unsubscribe by visiting
>>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>>>> 
>>>>>>>> The archives can be found at:
>>>>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Craig Holland
>>>>>>> Mgr, Operations
>>>>>>> Cisco Media Solutions Group
>>>>>>> M: +1-650-787-7241
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Condor-users mailing list
>>>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
>>>>>>> a
>>>>>>> subject: Unsubscribe
>>>>>>> You can also unsubscribe by visiting
>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>>> 
>>>>>>> The archives can be found at:
>>>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>>> _______________________________________________
>>>>>> Condor-users mailing list
>>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>>>> subject: Unsubscribe
>>>>>> You can also unsubscribe by visiting
>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>> 
>>>>>> The archives can be found at:
>>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Craig Holland
>>>>> Mgr, Operations
>>>>> Cisco Media Solutions Group
>>>>> M: +1-650-787-7241
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Craig Holland
>>>> Mgr, Operations
>>>> Cisco Media Solutions Group
>>>> M: +1-650-787-7241
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Condor-users mailing list
>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>> 
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>> 
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>> 
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>> 
>> 
>> 
>> 
>> --
>> Craig Holland
>> Mgr, Operations
>> Cisco Media Solutions Group
>> M: +1-650-787-7241
>> 
>> 
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/




--
Craig Holland
Mgr, Operations
Cisco Media Solutions Group
M: +1-650-787-7241