[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_status............issues

You can also use condor_q -run to see where the jobs are running.
If you are concerned that jobs are always going to the same machine, try adding
a clause in your REQUIREMENTS that says "and don't go to machine A", something like
&& Machine != "machineA"
then you will find out if the pool CANNOT match with any other machine, or whether machineA
was just being greedy!
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx]On Behalf Of Partha sarathi
Sent: Friday, June 01, 2007 11:20 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] condor_status............issues

Now i am able to get the output for the condor_status -any and condor-status -submitters, but when i see the log files after the jobs are processed, i see all of them getting processed on the same machine that is submitted.............how to make all the machines in the pools work....
You can see the details below...please let me know if u need more stats to help me out.....
[condor@Perfcoelnx3 bin]$ ./SmallThreeJobSubmit

Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 56.
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 57.
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 58.
[condor@Perfcoelnx3 bin]$ ./condor_q

-- Submitter: Perfcoelnx3 : <> : Perfcoelnx3
  56.0   condor          5/31 11:33   0+00:00:00 I  0   9.8  partha1small.out
  57.0   condor          5/31 11:33   0+00:00:00 I  0   9.8  partha2small.out
  58.0   condor          5/31 11:33   0+00:00:00 I  0   9.8  partha3small.out

3 jobs; 3 idle, 0 running, 0 held
[condor@Perfcoelnx3 bin]$ ./condor_status -submitters

Name                 Machine      Running IdleJobs HeldJobs

condor@Perfcoelnx3   Perfcoelnx         0        3        0

                           RunningJobs           IdleJobs           HeldJobs

  condor@Perfcoelnx3                 0                  3                  0

               Total                 0                  3                  0
[condor@Perfcoelnx3 bin]$ ./condor_status -any

MyType               TargetType           Name

DaemonMaster         None                 PERFCOEWXP4.cts.com
Scheduler            None                 PERFCOEWXP5.cts.com
DaemonMaster         None                 PERFCOEWXP5.cts.com
Machine              Job                  Perfcoelnx3
Machine              Job                  PERFCOEWXP4.cts.com
Scheduler            None                 Perfcoelnx3
DaemonMaster         None                 Perfcoelnx3
Negotiator           None                 Perfcoelnx3
Submitter            None                 condor@Perfcoelnx3

On 5/31/07, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
> for condor_status -any, I don't see any output...

Could be any number of things. I'd start with:

1) Check your collector. It sounds like condor_status is finding your condor_config file on the machine because it usually complains if it can't, but the fact that -any isn't returning at least a list of central manager nodes is odd. Sounds like your request for information is being denied (or maybe not even reaching the collector). See if you can:
i)  ...ping your collector from this machine?
ii) ...run condor_status -any on the central collector machine to make sure it's collecting properly.

2) Check your firewall(s). If it's running a firewall turn it off temporarily. Did that help?

> and for condor_status -direct local host
> i get
> [condor@Perfcoelnx3 bin]$ ./condor_status -direct perfcoelnx3
> ./condor_status: Can't find address for startd perfcoelnx3

1) Check your process list. Make sure condor_startd is running on the machine:

  % ps -ef | grep condor_startd
  ttcbatch  2578  2569  0 May25 ?        00:03:38 condor_startd -f

2) Check to make sure DNS is resolving properly both ways. Do an nslookup of the hostname and then an nslookup of the IP address and make sure they're going to the same place. Check /etc/hosts and make sure the loopback line ( doesn't have the hostname on it.

That's all I can think of off the top of my head. Let us know if none of that helps.

- Ian

Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: