[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] New setup questions



Title: Message
Newbie alert!
 
I have installed Condor 6.6.6 on a Windows 2000 Professional box as the master.  I have run a few example jobs thru the Condor interface and they were scheduled and ran fine.
 
Now I have built a Red Hat 9 box and am trying to add it to the pool.  The condor_status command shows:
 
C:\Condor>condor_status
 
Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
 
Mike_RH.nuvie LINUX       INTEL  Owner      Idle       0.000    91  0+01:17:58
elpin.nuview. WINNT50     INTEL  Unclaimed  Idle       0.040   384  0+00:28:58
 
                     Machines Owner Claimed Unclaimed Matched Preempting
 
         INTEL/LINUX        1     1       0         0       0          0
       INTEL/WINNT50        1     0       0         1       0          0
 
               Total        2     1       0         1       0          0
 
and jobs still run correctly for Windows.  But my job that I have built for Linux queues itself but does not run:
 
C:\Condor>condor_q

-- Submitter: elpin.nuview.com : <192.168.1.218:4789> : elpin.nuview.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   3.0   mike            8/3  14:28   0+00:00:00 I  0   0.0  linux.bat
   4.0   mike            8/3  14:35   0+00:00:00 I  0   0.0  linux.bat
 
2 jobs; 2 idle, 0 running, 0 held
 
here is the output from an analyze:
 
C:\Condor\examples\printname>condor_q -analyze 3.0

-- Submitter: elpin.nuview.com : <192.168.1.218:4789> : elpin.nuview.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
---
003.000:  Run analysis summary.  Of 2 machines,
      1 are rejected by your job's requirements
      1 reject your job because of their own requirements
      0 match, but are serving users with a better priority in the pool
      0 match, match, but reject the job for unknown reasons
      0 match, but will not currently preempt their existing job
      0 are available to run your job
        No successful match recorded.
        Last failed match: Tue Aug 03 15:00:04 2004
        Reason for last match failure: no match found
 
and I notice the following in the output of condor_q -long:
 
Requirements = (OpSys == "LINUX" && Arch == "INTEL") && (Disk >= DiskUsage) && ((Memory * 1024) >= ImageSize) && (HasFileTransfer)
 
which were not the requirements that I specified (I only had Requirements = (OpSys == "LINUX" && Arch == "INTEL"))...
 
And the final piece of the puzzle seems to be:
 
C:\Condor>condor_status -available
 
Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
 
elpin.nuview. WINNT50     INTEL  Unclaimed  Idle       0.040   384  0+00:38:58
 
                     Machines Owner Claimed Unclaimed Matched Preempting
 
       INTEL/WINNT50        1     0       0         1       0          0
 
               Total        1     0       0         1       0          0
 
which implies to me that the Linux box is not available for running jobs at all!
 
So, if you wouln't mind helping a rank beginner (in both Condor AND Linux):
 
1) Any guesses or things to look for on why the Linux box won't play?
2) Are the jobs no running because the Linux box is not participating, or is there more wrong?
3) Where are the other "requirements" coming from?  I didn't put them in the submit file!
4) The ultimate goal of this experiment is to determine if Condor and DAG will give us the job submission/synch control that we need to kick off a "job" which causes other jobs to run (across multiple platforms at once) with a final job requiring that all previous jobs completed normally.
 
Any help would be appreciated.  TIA!