[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] problem with condor_q -analyze



On Wed, 6 Jun 2007, Partha sarathi wrote:

The submit files are like this


[condor@Perfcoelnx3 bin]$ cat vanilla1.submit
Universe = vanilla
Executable = ./partha1.out
output = processedJob1.out
Log = processedJob1.log

QUEUE
whereon ./partha1.out is the executable that i got by compilng a simple C
program using cc on red hat linux. i have two windows Xp machines and one
redhat linux machines in the pool. The jobs are not getting processed from
the windows machines.

This is the key problem.  If you are submitting jobs from a linux machine
by default it will add a requirement to the job classad that it should
only run on OPSYS=="Linux".  If you have an executable that
can actually run on both, you can override this by putting on requirements

APPEND_REQUIREMENTS = 'OPSYS=="Linux" || OPSYS=="Windows"'

(Check my syntax please-- i might have not specified the OS right
but that is the trick and that is why it says two of the machines
are rejected by your jobs requirements in condor_q -ana.

Steve




I think i gave u the required info............please let me know if you need
more info







On 6/6/07, Kewley, J (John) <j.kewley@xxxxxxxx> wrote:

 You would need to send some log files for further information, also your
submit file.

It says they don't match, so have a look at the requirements in the submit
file.

Is the OpSys and Arch the same on machine they run on and all the others?

JK



-----Original Message-----
*From:* condor-users-bounces@xxxxxxxxxxx [mailto:
condor-users-bounces@xxxxxxxxxxx]*On Behalf Of *Partha sarathi
*Sent:* Wednesday, June 06, 2007 12:27 PM
*To:* Condor-Users Mail List
*Subject:* Re: [Condor-users] problem with condor_q -analyze

when i give a *condor_q -run* i see only one job getting processed on one
machine and even the jobs are there they are not going the other machines in
the pool.......I can see the condor processes running on all the machines
but i have no clue why these machines are not able to process the
jobs........in the previous mail i sent the *condor_q -analyze*also.....please help me out........


[condor@Perfcoelnx3 bin]$ ./condor_q -run


-- Submitter: Perfcoelnx3 : <10.237.226.83:21193> : Perfcoelnx3
 ID      OWNER            SUBMITTED     RUN_TIME HOST(S)
  66.0   condor          6/5  07:03   0+04:50:28 Perfcoelnx3

[condor@Perfcoelnx3 bin]$ ./condor_q

-- Submitter: Perfcoelnx3 : <10.237.226.83:21193> : Perfcoelnx3
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  66.0   condor          6/5  07:03   0+04:41:52 R  0   9.8  partha2.out
  67.0   condor          6/5  07:03   0+00:00:00 I  0   9.8  partha3.out
  68.0   condor          6/5  07:03   0+00:00:00 I  0   9.8  partha4.out
  69.0   condor          6/5  07:03   0+00:00:00 I  0   9.8  partha5.out
  70.0   condor          6/5  07:03   0+00:00:00 I  0   9.8  partha6.out
  71.0   condor          6/5  07:03   0+00:00:00 I  0   9.8  partha7.out
  72.0   condor          6/5  07:03   0+00:00:00 I  0   9.8  partha8.out
  73.0   condor          6/5  07:03   0+00:00:00 I  0   9.8  partha9.out
  74.0   condor          6/5  07:03   0+00:00:00 I  0   9.8  partha10.out

9 jobs; 8 idle, 1 running, 0 held


On 6/6/07, Partha sarathi <jinka.partha@xxxxxxxxx> wrote:
>
> My jobs are processed on the same mahcine frm which they are
> submitted...i have no idea why they are not going to other
> machines.......can somebody give me a clue what is going wrong...........
>
>
> i gave a condor_q -analyaze , after submitting jobs and my output is
>
>
> 069.000:  Run analysis summary.  Of 3 machines,
>       2 are rejected by your job's requirements
>       0 reject your job because of their own requirements
>       1 match but are serving users with a better priority in the pool
>       0 match but reject the job for unknown reasons
>       0 match but will not currently preempt their existing job
>       0 are available to run your job
> ---
> 070.000:  Run analysis summary.  Of 3 machines,
>       2 are rejected by your job's requirements
>       0 reject your job because of their own requirements
>       1 match but are serving users with a better priority in the pool
>       0 match but reject the job for unknown reasons
>       0 match but will not currently preempt their existing job
>       0 are available to run your job
> ---
> 071.000:  Run analysis summary.  Of 3 machines,
>       2 are rejected by your job's requirements
>       0 reject your job because of their own requirements
>       1 match but are serving users with a better priority in the pool
>       0 match but reject the job for unknown reasons
>       0 match but will not currently preempt their existing job
>       0 are available to run your job
> ---
> 072.000:  Run analysis summary.  Of 3 machines,
>       2 are rejected by your job's requirements
>       0 reject your job because of their own requirements
>       1 match but are serving users with a better priority in the pool
>       0 match but reject the job for unknown reasons
>       0 match but will not currently preempt their existing job
>       0 are available to run your job
> ---
> 073.000:  Run analysis summary.  Of 3 machines,
>       2 are rejected by your job's requirements
>       0 reject your job because of their own requirements
>       1 match but are serving users with a better priority in the pool
>       0 match but reject the job for unknown reasons
>       0 match but will not currently preempt their existing job
>       0 are available to run your job
> ---
> 074.000:  Run analysis summary.  Of 3 machines,
>       2 are rejected by your job's requirements
>       0 reject your job because of their own requirements
>       1 match but are serving users with a better priority in the pool
>       0 match but reject the job for unknown reasons
>       0 match but will not currently preempt their existing job
>       0 are available to run your job
>
>
>
> On 5/31/07, Ian Chesal <ICHESAL@xxxxxxxxxx > wrote:
> >
> > > initally it was like
> > >
> > > 127.0.0.1 localhost.localdomainperfcoelnx3 localhost
> > >
> > > but i changed it with all the machines in the pool like
> > >
> > > 127.0.0.1 perfcoelnx3
> > > 10.237.234.... second m/c
> > > 10.237.234.... third m/c
> >
> > This is wrong. It should be:
> >
> > 127.0.0.1       localhost.localdomain localhost
> > 10.237.234....  perfcoelnx3
> >
> > Right now you've got perfcoelnx3 resolving to the loopback address on
> > the machine. Kind of a circular route.
> >
> > This also explains your condor_status issues in the other email thread
> > BTW.
> >
> > - Ian
> >
> >
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxxxxxx a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> >
>
>

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/




--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.