[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Problems on Linux (Fedora Core 3)



I figured out the problem - it was the linux 2.6 kernel bug with
condor 6.6.10. Condor thought that there was not enough memory on the
target machines to execute the commands. To fix this I switched to
Condor 6.7.10.

What are the macros that I would have to set to make 6.6.10 work on these boxes?
I know MEMORY is one of them, but what are the macros that set
VirtualMemory and TotalVirtual memory? I tried VIRTUAL_MEMORY and
TOTAL_VIRTUAL_MEMORY but they did not work.

Thanks
-Avi

On 8/25/05, Jaime Frey <jfrey@xxxxxxxxxxx> wrote:
> On Aug 24, 2005, at 10:39 AM, Avi Flamholz wrote:
> 
> > I am working on installing condor on a server farm of linux machines.
> > As a test I took two of them and I am trying to install there first,
> > as this is my first time dealing with condor. I have already completed
> > a test install on a pool of 3 solaris machines, which worked fine, but
> > with the fedora machines I encounter problems.
> >
> > The problem manifests itself as follows:
> >
> > I have machines 1&2. 1 is configured as the central manager and a
> > submit machine, 2 is configured as an execute and submit machine. Each
> > machine has multiple processors. When I run condor_status I get:
> >
> > ----------------------------------------------------------------------
> > ------------------------------------------------
> > $ condor_status
> >
> > Name          OpSys       Arch   State      Activity   LoadAv Mem
> > ActvtyTime
> >
> > vm1@machine2 LINUX       INTEL  Unclaimed  Idle       0.000     1  0
> > +00:15:04
> > vm2@machine2 LINUX       INTEL  Unclaimed  Idle       0.000     1  0
> > +00:20:05
> > vm3@machine2 LINUX       INTEL  Unclaimed  Idle       0.000     1  0
> > +00:20:06
> > vm4@machine2 LINUX       INTEL  Unclaimed  Idle       0.000     1  0
> > +00:20:07
> >
> >                      Machines Owner Claimed Unclaimed Matched
> > Preempting
> >
> >          INTEL/LINUX        4     0       0         4
> > 0          0
> >
> >                Total        4     0       0         4
> > 0          0
> > ----------------------------------------------------------------------
> > -------------------------------------------------
> >
> > However, when I run condor_findhost, I get:
> > ----------------------------------------------------------------------
> > --------
> > $ condor_findhost
> > Warning:  Found no submitters
> >
> > ERROR: 1 machines not available
> > ----------------------------------------------------------------------
> > --------
> >
> > And when I run condor_submit, the job waits on the queue indefinitely.
> >
> > Does anyone know what the issue might be?
> 
> Try running condor_q -analyze on your queued jobs.
> 
> +----------------------------------+---------------------------------+
> |            Jaime Frey            |  Public Split on Whether        |
> |        jfrey@xxxxxxxxxxx         |  Bush Is a Divider              |
> |  http://www.cs.wisc.edu/~jfrey/  |         -- CNN Scrolling Banner |
> +----------------------------------+---------------------------------+
> 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>