[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problems with jobs



Is there anyway arround this condor_shadow problem or move them to another
machine? Does it not defeat the point in having a powerfull 25 machine cluster here yet I need to end up with 50 processes running on my dainty little submitting
machine that it cant handle?? I would be better running my 1000 jobs on the
machine through a script in a loop :-s

I have now logged into my actual cluster.

These machines are all the same arch, memory etc.

2GB memory and 4Ghz CPUs - fairly powerfull.

I then submit a job... still not many of the vms being used up. Only 8 showing here
but there is upwards of 30 condor_shadow processes ? why not that many
busy machines?

condor@node2:~/jobs/helloworld> ~/release/bin/condor_status | grep "Busy"
vm1@xxxxxxxxx LINUX X86_64 Claimed Busy 1.000 2048 0+00:00:08 vm2@xxxxxxxxx LINUX X86_64 Claimed Busy 0.070 2048 0+00:00:05 vm1@xxxxxxxxx LINUX X86_64 Claimed Busy 1.000 2048 0+00:00:05 vm2@xxxxxxxxx LINUX X86_64 Claimed Busy 0.320 2048 0+00:00:05 vm2@xxxxxxxxx LINUX X86_64 Claimed Busy 0.000 2048 0+00:00:05 vm1@xxxxxxxxx LINUX X86_64 Claimed Busy 0.020 2048 0+00:00:04 vm1@xxxxxxxxx LINUX X86_64 Claimed Busy 1.000 2048 0+00:00:04 vm2@xxxxxxxxx LINUX X86_64 Claimed Busy 0.000 2048 0+00:00:05

condor_q reports

-- Submitter: node2.cluster.int : <146.191.165.52:44936> : node2.cluster.int
ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  4.2   condor         12/8  16:59   0+00:01:29 R  0   0.0  helloworld
  4.3   condor         12/8  16:59   0+00:01:28 R  0   0.0  helloworld
  4.4   condor         12/8  16:59   0+00:01:27 R  0   0.0  helloworld
  4.5   condor         12/8  16:59   0+00:00:34 R  0   0.0  helloworld
  4.6   condor         12/8  16:59   0+00:00:40 R  0   0.0  helloworld
  4.7   condor         12/8  16:59   0+00:00:03 R  0   0.0  helloworld
  4.8   condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.9   condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.11  condor         12/8  16:59   0+00:01:29 R  0   0.0  helloworld
  4.12  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.13  condor         12/8  16:59   0+00:01:32 R  0   0.0  helloworld
  4.14  condor         12/8  16:59   0+00:01:13 R  0   0.0  helloworld
  4.15  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.16  condor         12/8  16:59   0+00:00:17 R  0   0.0  helloworld
  4.17  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.18  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.19  condor         12/8  16:59   0+00:00:22 R  0   0.0  helloworld
  4.22  condor         12/8  16:59   0+00:01:31 R  0   0.0  helloworld
  4.23  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.24  condor         12/8  16:59   0+00:01:31 R  0   0.0  helloworld
  4.25  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.26  condor         12/8  16:59   0+00:01:31 R  0   0.0  helloworld
  4.27  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.28  condor         12/8  16:59   0+00:01:36 R  0   0.0  helloworld
  4.29  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.30  condor         12/8  16:59   0+00:00:20 R  0   0.0  helloworld
  4.31  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.32  condor         12/8  16:59   0+00:00:30 R  0   0.0  helloworld
  4.33  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.35  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.36  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.37  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.38  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.39  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.40  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.41  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.42  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.43  condor         12/8  16:59   0+00:00:00 R  0   0.0  helloworld
  4.44  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.45  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.46  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.47  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.48  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.49  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.50  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.51  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.52  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.53  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.54  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.55  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.56  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.57  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.58  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.59  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.60  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.61  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.62  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.63  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.64  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.65  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.66  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.67  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.68  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.69  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.70  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.71  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.72  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.73  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.74  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.75  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.76  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.77  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.78  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.79  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.80  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.81  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.82  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.83  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.84  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.85  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.86  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.87  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.88  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.89  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.90  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.91  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.92  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.93  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.94  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.95  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.96  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.97  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.98  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld
  4.99  condor         12/8  16:59   0+00:00:00 I  0   0.0  helloworld

94 jobs; 56 idle, 38 running, 0 held

condor_q -ana 4.99 reports

-- Submitter: node2.cluster.int : <146.191.165.52:44936> : node2.cluster.int
ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
---
004.099:  Run analysis summary.  Of 39 machines,
     1 are rejected by your job's requirements
     0 reject your job because of their own requirements
    38 match but are serving users with a better priority in the pool
     0 match but reject the job for unknown reasons
     0 match but will not currently preempt their existing job
     0 are available to run your job
       No successful match recorded.
       Last failed match: Thu Dec  8 17:05:45 2005
       Reason for last match failure: no match found

Chris