[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dose NFS and real IP matter to install condor?



Dear Nick,
Thanks so much for your information. It really helps me a lot. Though I still did not solve the problem, but now I was much more clear than before. There are still some questions: About the advertisement, do I manually send them out ? Why condor dose not accept the IP string? If I want to set up the machine name, can I just use like some name like "abulafia", instead of " abulafia.cs.wisc.edu", since my two machines are only for test usage, not the final deployment. Sorry for these newbie questions, coz I am a newbie condor users. Your help will be very helpful for me. While waiting for your answers, I would go ahead to try to solve the problems tonight. I will post the information if the problems are still there. Thanks a lot!

Best Regards
Sean

Nick LeRoy wrote:

On Tuesday 27 February 2007, Sean wrote:
Hi,
  I am trying to install condor in two machines. I successfully install
condor in both machines, and it runs ok if the job is submit locally and
run locally.  Below is the submit file.

  Executable     =   hello-cows
  Universe       =   standard
  Output         =   hello-cows.out$(Process)
  Log            =   hello-cows.log$(Process)
  Queue 5

However if I try to specify the job to be run in a certain machine,
using the submit profile:

  Executable     =   hello-cows
  Universe       =   standard
  Requirements   =   Machine == "172.8.30.2"
  Output         =   hello-cows.out$(Process)
  Log            =   hello-cows.log$(Process)
  Queue 5
It is stuck. when I use condor_q to check the status, the jobs are
always in the queue and never got run.


I use the same example to run in another runable condor system, and the
exmaple runs very well. Compare my system and the runnable system, I
found the differences:
1. The runable system is using real ip, and mine is using local IP
behind a firewall
2. The runable is using file sharing system, and mine does not have the
file system running.

Are the realy IP and file sharing system necessary to install condor? if
no, what I have missed?

Thanks for the answer.

Here's what's probably wrong... You need to understand how the requirements expression works a bit, and Condor's ClassAd mechanism in general (I'd encourage everyone to read that section of the manual at least once).

Every machine in the pool publishes a ClassAd know as the "Machine ad". In this ad, you'll see things like (you can see this via 'condor_status -l'

MyType = "Machine"
TargetType = "Job"
Name = "abulafia.cs.wisc.edu"
Machine = "abulafia.cs.wisc.edu"
Rank = 0.000000
....

These are all attribute / value pairs. Notice, in this case, the machine name is "abulafia.cs.wisc.edu", which is a string.

The Requirements expression that you supplied here will never become true become true because Condor is trying to match it with a machine ad which fullfills your requirements... namely that the Machine attribute in the ad must be exactly "172.8.30.2". Condor, by default, puts the FQDN of the machine in the Machine attribute, not it's IP address - something like "myhost.foo.com" which is not the same as the string that you specified.

Run 'condor_status -l' to look at your machine ad, look at what's in the "Machine" attribute. Run 'condor_q -better-analyze' which can provide suggestions to why the job isn't matching.

Regards,
Sean

I'd start with the above... If this doesn't help, post the output of the 'condor_q -better-analyze' output (above) and/or 'condor_status -l' of the machine that you expect to match this job. There are a lot of other things that could be going wrong, but let's start with the basics.

-Nick


begin:vcard
fn:Sean
n:;Sean
version:2.1
end:vcard