[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] condor 8.6.5



Hi,


I'm running condor.x86_64  (8.6.5-1.el7), installed via yum, on a cluster of linux machines running RHEL7.


To test the install, I wrote a small python program (below) to submit to the pool.  


So far as I can tell, the pool accepts the job, but then via condor_q the job "holds" indefinitely.  Is there a config or submit detail I screwed up?  I reread  the install/config instructions and haven't found my error yet.


I'm submitting a job from my user (non-root) account on one of the cluster machines.  All machines are eligible to submit.  Do I need to start the job from shared (NFS) scratch  space of something like that?  I didn't see much about file structure in the install documentation.


Any suggestions would be appreciated!


Nathan



Here's the queue: 

[nmoore@pilgrim condor_sub]$ condor_q



-- Schedd: pilgrim : <199.17.158.20:9618?... @ 08/04/17 11:03:15

OWNER  BATCH_NAME               SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS

nmoore CMD: estimate_pi.py     8/4  10:07      _      _      _      3      3 2.0 ... 4.0

nmoore CMD: estimate_pi-2.py   8/4  10:15      _      _      _      1      1 5.0


4 jobs; 0 completed, 0 removed, 0 idle, 0 running, 4 held, 0 suspended


Here's a submit script:

[nmoore@pilgrim condor_sub]$ cat submit_file
  executable     = estimate_pi.py
  universe       = vanilla

  output         = job.out
  error          = job.error
  log            = job.log

  queue


And here's the python program - note all machines have python3 available in path:
[nmoore@pilgrim condor_sub]$ cat estimate_pi.py
#!/usr/bin/python3
#
# Nathan Moore, Winona State
# 2017-Aug-4
#
# PROGRAM DESCRIPTION
#
# This example program estimates the value of pi via a random number generator.
# two random numbers, x and y, are generated, each in the space [0,1).  This
# means a point within a square of edge length 1.0 has been randomly generated.
# If the square is overlaid with a circle, centered at 0,0, the area of the box
# is 1.0^2 and the area of the circle inside the box is (pi*1.0^2)/4, because only
# one quarter of the circle is inside the square.
# Then, the ratio of random points inside the circle over numbers generated (inside
# approaches the ratio of the quarter circle area over box area, pi/4
#
# Note, since this is a random process, uncertainty in num_inside goes as sqrt(num_inside)
# so convergence to a reasonable approximation to pi is quite slow (eg, if you want the
# method to be accurate to one in a hundred, you'll probably have to generate 100^2 points

import math
import random

seed_value=209
limit=1000

random.seed(seed_value)

num_inside=0
for i in range(limit):
    x=random.random()
    y=random.random()
    r_sqr=x*x+y*y
    if(r_sqr<1.0) :
        num_inside+=1
    #print(x,y,r_sqr,num_inside)

print("# estimate of pi/4 is ",num_inside/limit)
est_pi=4*num_inside/limit
print("# which gives pi as ",est_pi)
# print out data line
print("# seed, num_trials, num_inside, pi estimate")
print(seed_value,",",limit,",",num_inside,",",est_pi,",")


# In[33]:

# write the results to file, include the random seed value in the filename

# open the file
filename="pi_results.seed."+str(seed_value)+".csv"
#print(filename)
f=open(filename,"w")

# write results to file
line="# estimate of pi/4 is %6.4f \n" % (num_inside/limit)
f.write(line)
est_pi=4*num_inside/limit
line="# which gives pi as %6.4f \n" % (est_pi)
f.write(line)
line="# seed, num_trials, num_inside, pi estimate,\n"
f.write(line)
line="%d,%d,%d,%10.8f\n" % (seed_value,limit,num_inside,est_pi)
f.write(line)

f.close()

Within the pool, everything is unclaimed and idle:
[root@toulouse ~]# condor_status
Name             OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@albatross  LINUX      X86_64 Unclaimed Idle      0.000 2674  0+00:34:36
slot2@albatross  LINUX      X86_64 Unclaimed Idle      0.000 2674  0+00:35:03
slot3@albatross  LINUX      X86_64 Unclaimed Idle      0.000 2674  0+00:35:03
slot4@albatross  LINUX      X86_64 Unclaimed Idle      0.000 2674  0+00:35:03
slot5@albatross  LINUX      X86_64 Unclaimed Idle      0.000 2674  0+00:35:03
slot6@albatross  LINUX      X86_64 Unclaimed Idle      0.000 2674  0+00:35:03
slot7@albatross  LINUX      X86_64 Unclaimed Idle      0.000 2674  0+00:35:03
slot8@albatross  LINUX      X86_64 Unclaimed Idle      0.000 2674  0+00:35:03
...
slot3@wyandotte  LINUX      X86_64 Unclaimed Idle      0.000 3988  0+00:30:04
slot4@wyandotte  LINUX      X86_64 Unclaimed Idle      0.000 3988  0+00:30:04
slot5@wyandotte  LINUX      X86_64 Unclaimed Idle      0.000 3988  0+00:30:04
slot6@wyandotte  LINUX      X86_64 Unclaimed Idle      0.000 3988  0+00:30:04
slot7@wyandotte  LINUX      X86_64 Unclaimed Idle      0.000 3988  0+00:30:04
slot8@wyandotte  LINUX      X86_64 Unclaimed Idle      0.000 3988  0+00:30:04

                     Machines Owner Claimed Unclaimed Matched Preempting  Drain

        X86_64/LINUX       56     0       0        56       0          0      0

               Total       56     0       0        56       0          0      0