[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problems with version 7.4.2



>Unless you have a need for globus, I highly recommend "going native" on
>fedora (11, 12, or 13).  It is a "proper" build linking against the
>system distro'd libs.   
>
>yum list condor
>yum install condor 
>

Agreed, but in addition to lacking grid universe/globus support, note I think the fedora yum package also currently lacks standard universe and perhaps a couple other grid types. 

regards
Todd


>Cheers,
>Tim
>
>On Tue, 2010-06-29 at 11:57 +0100, Alan wrote:
> Sounds like a similar issue reported here:
> 
> 
> http://www.escience.cam.ac.uk/projects/camgrid/upgrade.html
> 
> 
> Alan
> 
> On Tue, Jun 29, 2010 at 10:47, Diana Lousa <dlousa@xxxxxxxxxxx> wrote:
>         Hello,
>         
>         We have installed condor version 7.4.2 in a cluster composed
>         of machines with Fedora and Ubuntu 10.04 OS. Our installation
>         is in shared directories and we have different binaries for
>         Fedora and Ubuntu 
>         
>         (condor-7.4.2-linux-x86-rhel3-dynamic and
>         condor-7.4.2-linux-x86-debian50-dynamic, respectively). We
>         also have the home dir of condor and the configuration files
>         in a shared directory. The local dir of our central
>         manager/dedictaed sched id in a local directory and for all
>         the other machines it is in a shared directory. We have been
>         experiencing some serious problems:
>         
>         1- The condor_submit command gets hung:
>          Sometimes when I submit jobs, condor_submit gets stuck,
>         althoug the job enters the queue, the command doesn't stop and
>         I have to kill it with ctrl+c
>         
>         2. Jobs return to Idle state and can't be removed:
>         One of our users has jobs that return to the Idle state after
>         they terminate or die. He then tries to remove these jobs from
>         the queue, but that action causes condor to go crazy. Condor_q
>         stops responding and shows the message:
>         -- Failed to fetch ads from: <192.168.127.3:39790> :
>         zyon.itqb.unl.pt
>         and then all the jobs die.
>         
>         It is worth pointing out that everything works fine when we
>         use an older version of condor (6.8.4) in our central
>         manager/dedicated sched. However, we only have Fedora binaries
>         for these version and these means  that we can not run this
>         version  in a  machine with Ubuntu (due to  libraries
>         incompatibility) and our goal is to have a machine with Ubuntu
>         10.04  as  central manager/dedicated sched..
>         
>         Can anyone help?
>         
>         
>