[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Solaris 10 - All jobs idling for ever...



Hi,

I'm trying to set up a pool  in Solaris 10 (using the Solaris 9 distribution since there doesn't seem to be a version 10 distro yet), but I'm running in to a few problems... All the jobs I submit remain idle for ever... I tried with quick and dirty unix commands like "sleep 10" and "date" just to try it out but with no luck. What I'm seeing right now is this:

bgoncal@lab1a> condor_q


-- Submitter: lab1a : <170.140.151.110:60209> : lab1a
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   1.0   bgoncal         9/14 13:24   0+00:00:00 I  0   0.0  sleep 10
   1.1   bgoncal         9/14 13:24   0+00:00:00 I  0   0.0  sleep 10
   1.2   bgoncal         9/14 13:24   0+00:00:00 I  0   0.0  sleep 10
   1.3    bgoncal         9/14 13:24   0+00:00:00 I  0   0.0  sleep 10
.
.
.

bgoncal@lab1a> condor_q -analyze 1.0


-- Submitter: lab1a : <170.140.151.110:60209 > : lab1a
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
---
001.000:  Run analysis summary.  Of 123 machines,
      0 are rejected by your job's requirements
      3 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
    120 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
        Last successful match: Thu Sep 15 11:51:06 2005
bgoncal@lab1a>


bgoncal@lab1a> condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

vm1@lab1a      SOLARIS5.10 SUN4u  Owner      Idle       0.020   512  0+00:10:17
vm2@lab1a     SOLARIS5.10 SUN4u  Unclaimed  Idle       0.000   512  0+00:00:05
vm1@lab1b     SOLARIS5.10 SUN4u  Unclaimed  Idle       0.010   512  0+00:03:04
vm2@lab1b     SOLARIS5.10 SUN4u  Unclaimed  Idle       0.000   512  0+00:03:05
vm1@lab1c     SOLARIS5.10 SUN4u  Unclaimed  Idle       0.000   512  0+00:03:25
.
.
.

bgoncal@lab3c> more condor/hosts/lab3c/log/SchedLog
9/14 12:00:49 (pid:11210) passwd_cache::cache_uid(): getpwnam("condor") failed:
Error 0

9/14 12:00:49 (pid:11210) passwd_cache::cache_uid(): getpwnam("condor") failed:
Error 0

9/14 12:00:49 (pid:11210) ******************************************************
9/14 12:00:49 (pid:11210) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
9/14 12:00:49 (pid:11210) ** /home/bgoncal/condor/sbin/condor_schedd
9/14 12:00:49 (pid:11210) ** $CondorVersion: 6.7.10 Aug  3 2005 $
9/14 12:00:49 (pid:11210) ** $CondorPlatform: SUN4X-SOLARIS29 $
9/14 12:00:49 (pid:11210) ** PID = 11210
9/14 12:00:49 (pid:11210) ******************************************************
9/14 12:00:49 (pid:11210) Using config file: /home/bgoncal/condor/etc/condor_con
fig
9/14 12:00:49 (pid:11210) Using local config files: /home/bgoncal/condor//hosts/
lab3c/condor_config.local
9/14 12:00:49 (pid:11210) DaemonCore: Command Socket at <170.140.151.128:50890 >
9/15 11:40:54 (pid:11210) DaemonCore: Command received via UDP from host <170.14
0.151.110:63801>
9/15 11:40:54 (pid:11210) DaemonCore: received command 60014 (DC_INVALIDATE_KEY)
, calling handler (handle_invalidate_key())

and on the StarterLog on the same machine we see:

9/14 13:21:05 get_mouse_info(): Failed to open /proc/interrupts
9/14 13:21:05 Failed to obtain keyboard or mouse idle information.
9/14 13:21:05 Assuming the keyboard and mouse to be infinitely idle.
9/14 13:24:57 DaemonCore: Command received via UDP from host <170.140.151.110:56
916>
9/14 13:24:57 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/14 13:24:57 vm1: match_info called
9/14 13:24:57 vm1: Received match <170.140.151.128:50889>#1126713649#3
9/14 13:24:57 vm1: State change: match notification protocol successful
9/14 13:24:57 vm1: Changing state: Unclaimed -> Matched
9/14 13:24:58 DaemonCore: Command received via UDP from host <170.140.151.110:56
925>
9/14 13:24:58 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/14 13:24:58 vm2: match_info called
9/14 13:24:58 vm2: Received match <170.140.151.128:50889>#1126713649#2
9/14 13:24:58 vm2: State change: match notification protocol successful
9/14 13:24:58 vm2: Changing state: Unclaimed -> Matched
9/14 13:25:01 DaemonCore: Command received via UDP from host <170.140.151.110:57
022>
9/14 13:25:01 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/14 13:25:01 vm1: State change: received RELEASE_CLAIM command
9/14 13:25:01 vm1: Changing state: Matched -> Owner
9/14 13:25:01 vm1: State change: IS_OWNER is false
9/14 13:25:01 vm1: Changing state: Owner -> Unclaimed
9/14 13:25:02 DaemonCore: Command received via UDP from host <170.140.151.110:57
030>
9/14 13:25:02 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/14 13:25:02 vm2: State change: received RELEASE_CLAIM command
9/14 13:25:02 vm2: Changing state: Matched -> Owner
9/14 13:25:02 vm2: State change: IS_OWNER is false
9/14 13:25:02 vm2: Changing state: Owner -> Unclaimed
9/14 13:25:52 DaemonCore: Command received via UDP from host <170.140.151.110:57
143>
9/14 13:25:52 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/14 13:25:52 vm1: match_info called
9/14 13:25:52 vm1: Received match <170.140.151.128:50889>#1126713649#4
9/14 13:25:52 vm1: State change: match notification protocol successful
9/14 13:25:52 vm1: Changing state: Unclaimed -> Matched
9/14 13:25:53 DaemonCore: Command received via UDP from host <170.140.151.110:57
151>
9/14 13:25:53 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/14 13:25:53 vm2: match_info called
9/14 13:25:53 vm2: Received match <170.140.151.128:50889>#1126713649#5
9/14 13:25:53 vm2: State change: match notification protocol successful
9/14 13:25:53 vm2: Changing state: Unclaimed -> Matched
9/14 13:25:56 DaemonCore: Command received via UDP from host <170.140.151.110:57
246>
9/14 13:25:56 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/14 13:25:56 vm1: State change: received RELEASE_CLAIM command
9/14 13:25:56 vm1: Changing state: Matched -> Owner
9/14 13:25:56 vm1: State change: IS_OWNER is false
9/14 13:25:56 vm1: Changing state: Owner -> Unclaimed
9/14 13:25:56 DaemonCore: Command received via UDP from host <170.140.151.110:57
254>
9/14 13:25:56 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/14 13:25:56 vm2: State change: received RELEASE_CLAIM command
9/14 13:25:56 vm2: Changing state: Matched -> Owner
9/14 13:25:56 vm2: State change: IS_OWNER is false
9/14 13:25:56 vm2: Changing state: Owner -> Unclaimed
9/14 13:26:05 Failed to open /proc/interrupts
9/14 13:26:05 get_mouse_info(): Failed to open /proc/interrupts
9/14 13:26:05 Failed to obtain keyboard or mouse idle information.
9/14 13:26:05 Assuming the keyboard and mouse to be infinitely idle.
9/14 13:26:51 DaemonCore: Command received via UDP from host <170.140.151.110:57
366>
9/14 13:26:51 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/14 13:26:51 vm1: match_info called
9/14 13:26:51 vm1: Received match <170.140.151.128:50889>#1126713649#6
9/14 13:26:51 vm1: State change: match notification protocol successful
9/14 13:26:51 vm1: Changing state: Unclaimed -> Matched
9/14 13:26:51 DaemonCore: Command received via UDP from host <170.140.151.110:57
374>
9/14 13:26:51 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/14 13:26:51 vm2: match_info called
9/14 13:26:51 vm2: Received match <170.140.151.128:50889>#1126713649#7
9/14 13:26:51 vm2: State change: match notification protocol successful
9/14 13:26:51 vm2: Changing state: Unclaimed -> Matched
9/14 13:26:54 DaemonCore: Command received via UDP from host <170.140.151.110:57
469>
9/14 13:26:54 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/14 13:26:54 vm1: State change: received RELEASE_CLAIM command
9/14 13:26:54 vm1: Changing state: Matched -> Owner
9/14 13:26:54 vm1: State change: IS_OWNER is false
9/14 13:26:54 vm1: Changing state: Owner -> Unclaimed
9/14 13:26:54 DaemonCore: Command received via UDP from host <170.140.151.110:57
477>
9/14 13:26:54 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/14 13:26:54 vm2: State change: received RELEASE_CLAIM command
9/14 13:26:54 vm2: Changing state: Matched -> Owner
9/14 13:26:54 vm2: State change: IS_OWNER is false
9/14 13:26:54 vm2: Changing state: Owner -> Unclaimed
9/14 13:31:05 Failed to open /proc/interrupts
9/14 13:31:05 get_mouse_info(): Failed to open /proc/interrupts
9/14 13:31:05 Failed to obtain keyboard or mouse idle information.
9/14 13:31:05 Assuming the keyboard and mouse to be infinitely idle.
 
and just goes on like this for a while... Any ideas as to what is going on?

Bruno


--
*******************************************
Bruno Miguel Tavares Goncalves, MS
PhD Candidate
Emory University
Department of Physics
Office No. N117-C
400 Dowman Drive
Atlanta, Georgia 30322
Homepage: www.bgoncalves.com
Email: bgoncalves@xxxxxxxxx
Phone: (404) 712-2441
Fax:   (404) 727-0873
*******************************************