[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dose NFS and real IP matter to install condor?



On Tuesday 27 February 2007, Sean wrote:
> Hi,
>    I am trying to install condor in two machines. I successfully install
> condor in both machines, and it runs ok if the job is submit locally and
> run locally.  Below is the submit file.
>
>    Executable     =   hello-cows
>    Universe       =   standard
>    Output         =   hello-cows.out$(Process)
>    Log            =   hello-cows.log$(Process)
>    Queue 5
>
> However if I try to specify the job to be run in a certain machine,
> using the submit profile:
>
>    Executable     =   hello-cows
>    Universe       =   standard
>    Requirements   =   Machine == "172.8.30.2"
>    Output         =   hello-cows.out$(Process)
>    Log            =   hello-cows.log$(Process)
>    Queue 5
> It is stuck. when I use condor_q to check the status, the jobs are
> always in the queue and never got run.


> I use the same example to run in another runable condor system, and the
> exmaple runs very well. Compare my system and the runnable system, I
> found the differences:
> 1. The runable system is using real ip, and mine is using local IP
> behind a firewall
> 2. The runable is using file sharing system, and mine does not have the
> file system running.
>
> Are the realy IP and file sharing system necessary to install condor? if
> no, what I have missed?

> Thanks for the answer.

Here's what's probably wrong...  You need to understand how the requirements 
expression works a bit, and Condor's ClassAd mechanism in general (I'd 
encourage everyone to read that section of the manual at least once).

Every machine in the pool publishes a ClassAd know as the "Machine ad".  In 
this ad, you'll see things like (you can see this via 'condor_status -l'

MyType = "Machine"
TargetType = "Job"
Name = "abulafia.cs.wisc.edu"
Machine = "abulafia.cs.wisc.edu"
Rank = 0.000000
....

These are all attribute / value pairs.  Notice, in this case, the machine name 
is "abulafia.cs.wisc.edu", which is a string.

The Requirements expression that you supplied here will never become true 
become true because Condor is trying to match it with a machine ad which 
fullfills your requirements... namely that the Machine attribute in the ad 
must be exactly "172.8.30.2".  Condor, by default, puts the FQDN of the 
machine in the Machine attribute, not it's IP address - something 
like "myhost.foo.com" which is not the same as the string that you specified.

Run 'condor_status -l' to look at your machine ad, look at what's in 
the "Machine" attribute.  Run 'condor_q -better-analyze' which can provide 
suggestions to why the job isn't matching.

> Regards,
> Sean

I'd start with the above... If this doesn't help, post the output of 
the 'condor_q -better-analyze' output (above) and/or 'condor_status -l' of 
the machine that you expect to match this job.  There are a lot of other 
things that could be going wrong, but let's start with the basics.

-Nick

-- 
           <<< Why, oh, why, didn't I take the blue pill? >>>
 /`-_    Nicholas R. LeRoy               The Condor Project
{     }/ http://www.cs.wisc.edu/~nleroy  http://www.cs.wisc.edu/condor
 \    /  nleroy@xxxxxxxxxxx              The University of Wisconsin
 |_*_|   608-265-5761                    Department of Computer Sciences