[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Store a credential for a condor user



Hi!

 

Thanks for the answers Ian!

 

My condor pool consists of 4 machines (3 of them are SMP machines). The condor status lists the following,

 

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

 

O2F-sth-LAP-002.un WINNT51    INTEL  Unclaimed Idle     0.000  1527  0+00:40:04

slot1@o2f-mbl-lap- WINNT51    INTEL  Unclaimed Idle     0.000  1767  0+01:51:58

slot2@o2f-mbl-lap- WINNT51    INTEL  Unclaimed Idle     0.000  1767  0+01:57:04

slot1@O2F-STH-LAP- WINNT60    INTEL  Unclaimed Idle     0.810  1534  0+02:05:04

slot2@O2F-STH-LAP- WINNT60    INTEL  Unclaimed Idle     0.000  1534  0+02:05:05

slot1@o2f-sth-lap- WINNT61    INTEL  Unclaimed Idle     0.000  1767  0+01:21:24

slot2@o2f-sth-lap- WINNT61    INTEL  Unclaimed Idle     0.000  1767  0+01:21:25

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

 

       INTEL/WINNT51     3     0       0         3       0          0        0

       INTEL/WINNT60     2     0       0         2       0          0        0

       INTEL/WINNT61     2     0       0         2       0          0        0

 

               Total     7     0       0         7       0          0        0

 

 

The different colors mark different machines.

The central manager is marked with green.

 

When I submit a job the only machine that changes the status from unclaimed to claimed is the central manager (condor_status below).

 

 

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

 

O2F-sth-LAP-002.un WINNT51    INTEL  Unclaimed Idle     0.000  1527  0+00:45:04

slot1@o2f-mbl-lap- WINNT51    INTEL  Unclaimed Idle     0.000  1767  0+01:51:58

slot2@o2f-mbl-lap- WINNT51    INTEL  Unclaimed Idle     0.000  1767  0+01:57:04

slot1@O2F-STH-LAP- WINNT60    INTEL  Unclaimed Idle     0.810  1534  0+02:05:04

slot2@O2F-STH-LAP- WINNT60    INTEL  Unclaimed Idle     0.000  1534  0+02:05:05

slot1@o2f-sth-lap- WINNT61    INTEL  Claimed   Busy     0.000  1767  0+00:00:05

slot2@o2f-sth-lap- WINNT61    INTEL  Claimed   Busy     0.000  1767  0+00:00:05

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

 

       INTEL/WINNT51     3     0       0         3       0          0        0

       INTEL/WINNT60     2     0       0         2       0          0        0

       INTEL/WINNT61     2     0       2         0       0          0        0

 

               Total     7     0       2         5       0          0        0

 

 

Why it’s only the central manager that changes to claimed?

I want all the machines to execute jobs but only the central manager can submit jobs.

All the machines have START=TRUE and STARTD in the DAEMON_LIST.

 

 

>Just for some clarification: is this the condor_credd daemon running on your central manager machine?

 

Yes, condor_credd is running only on the central machine.

 

>You only need one credd daemon for an entire pool, not one on each machine.

>Every machine should be connecting to the condor_credd daemon on your central manager to get credentials for users.

 

Is this done by default? If not, how should I indicate it?

 

Another question:

I have tried to run condor_birdwatcher but it says that condor is off, although I believe condor is running.

How does condor birdwatcher work?

 

 

Cheers,

Sónia

 

 

 

Från: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] För Ian Chesal
Skickat: den 3 september 2010 15:45
Till: Condor-Users Mail List
Ämne: Re: [Condor-users] Store a credential for a condor user

 

On Fri, Sep 3, 2010 at 8:50 AM, Sónia Liléo <sonia.lileo@xxxxx> wrote:

Hi again!

 

The jobs are now running in the central manager. I added STARTD to the daemon_list.

 

Perfect. Nice work.

 

But the other machine of my condor pool is still not executing.

 

If the state of the machine is still Owner it means START = False on the box and that's why it isn't running your jobs.

 

The CredLog looks like this,

 

09/03 13:38:20 Locale: English_United States.1252
09/03 13:38:20 WARNING: Config source is empty: C:\condor/condor_config.local
09/03 13:38:20 ******************************************************
09/03 13:38:20 ** condor_credd.exe (CONDOR_CREDD) STARTING UP
09/03 13:38:20 ** C:\condor\bin\condor_credd.exe
09/03 13:38:20 ** SubsystemInfo: name=CREDD type=DAEMON(11) class=DAEMON(1)
09/03 13:38:20 ** Configuration: subsystem:CREDD local:<NONE> class:DAEMON
09/03 13:38:20 ** $CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
09/03 13:38:20 ** $CondorPlatform: INTEL-WINNT50 $
09/03 13:38:20 ** PID = 756
09/03 13:38:20 ** Log last touched 9/3 12:38:13
09/03 13:38:20 ******************************************************
09/03 13:38:20 Using config source: C:\condor\condor_config
09/03 13:38:20 Using local config sources:
09/03 13:38:20    C:\condor/condor_config.local
09/03 13:38:20 DaemonCore: Command Socket at <10.110.44.212:1342>
09/03 13:38:20 Will use UDP to update collector o2f-sth-lap-016.un.dr.dgcsystems.net <10.110.44.76:9618>
09/03 13:38:20 main_init() called
09/03 13:38:20 Trying to update collector <10.110.44.76:9618>
09/03 13:38:20 Attempting to send update via UDP to collector o2f-sth-lap-016.un.dr.dgcsystems.net <10.110.44.76:9618>
09/03 13:38:20 File descriptor limits: max 1024, safe 820
09/03 13:38:20 Initialized the following authorization table:
09/03 13:38:20 Authorizations yet to be resolved:
09/03 13:38:20 allow NEGOTIATOR:  */10.110.44.76 */o2f-sth-lap-016.un.dr.dgcsystems.net

 

Just for some clarification: is this the condor_credd daemon running on your central manager machine? You only need one credd daemon for an entire pool, not one on each machine. Every machine should be connecting to the condor_credd daemon on your central manager to get credentials for users.

 

And the StartLog,

 

09/03 14:29:10 Locale: English_United States.1252
09/03 14:29:10 ******************************************************
09/03 14:29:10 ** condor_startd.exe (CONDOR_STARTD) STARTING UP
09/03 14:29:10 ** C:\condor\bin\condor_startd.exe
09/03 14:29:10 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)
09/03 14:29:10 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
09/03 14:29:10 ** $CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
09/03 14:29:10 ** $CondorPlatform: INTEL-WINNT50 $
09/03 14:29:10 ** PID = 3424
09/03 14:29:10 ** Log last touched 9/3 13:29:01
09/03 14:29:10 ******************************************************
09/03 14:29:10 Using config source: C:\condor\condor_config
09/03 14:29:10 Using local config sources:
09/03 14:29:10    C:\condor/condor_config.local
09/03 14:29:10 DaemonCore: Command Socket at <10.110.44.212:1479>
09/03 14:29:10 my_popen: CreateProcess failed
09/03 14:29:10 Failed to run hibernation plugin 'C:\condor/libexec/power_state ad'
09/03 14:29:16 my_popen: CreateProcess failed
09/03 14:29:16 Failed to execute C:\condor/bin/condor_starter.std.exe, ignoring
09/03 14:29:16 VM-gahp server reported an internal error
09/03 14:29:16 VM universe will be tested to check if it is available
09/03 14:29:16 History file rotation is enabled.
09/03 14:29:16   Maximum history file size is: 20971520 bytes
09/03 14:29:16   Number of rotated history files is: 2
09/03 14:29:16 New machine resource allocated
09/03 14:29:21 About to run initial benchmarks.
09/03 14:29:27 Completed initial benchmarks.
09/03 14:29:27 State change: IS_OWNER is false
09/03 14:29:27 Changing state: Owner -> Unclaimed

 

This is from the machine where jobs are not running but you would like them to run? That last line indicates the machine is Unclaimed -- so START != False and the machine could potentially run jobs. 

 

Can you show me the output of condor_status and indicate which machine you'd like the jobs to be running on?

 

Do you believe that this machine is not executing due to the problem with storing the credential or might be something else?

 

It's hard to say at this point.

 

- Ian

 


Cycle Computing, LLC
The Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com