[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Bit of a problem with HAD



Hi

> 1/10 12:24:32 (pid:2144) DaemonCore: PERMISSION DENIED to unknown user
> from host <136.200.XXXXX:1831> for command 416 (NEGOTIATE)

Just regarding this last lines of the Schedd log, I would like to ask:
did you set :

HOSTALLOW_NEGOTIATOR = $(HAD_MACHINES)
HOSTALLOW_NEGOTIATOR_SCHEDD = $(HAD_MACHINES) 

What kind of security/authentication setup do you have in your pool?
It looks like the backup Neg is there, but can't talk to MERRIT Schedd.

Aside for this remark I would also like to see all the logs and get the answers
to the questions that Nick has specified.

Thank you

--- Gabi Kliot ---


Quoting "Finch, Ralph" <rfinch@xxxxxxxxxxxx>:

> condor -version
> $CondorVersion: 6.7.13 Nov  7 2005 $
> $CondorPlatform: INTEL-WINNT50 $
> 
> My desktop machine and another machine are the HAD machines, and also
> serve as condor executors.
> 
> When I installed this a few weeks ago things were working OK, though I
> don't think I tested dagman then.  Now I have these symptoms:  when I
> submit a dagman job, the jobs wait in the queue several minutes.  Then
> on my machine (MERRIT) a condor_exec.exe starts and runs full CPU speed,
> but no other jobs start to run.
> 
> I also get this in MERRIT's SchedLog:
> 
> 1/10 12:24:19 (pid:2144) Sent ad to central manager for
> rfinch@xxxxxxxxxxxx
> 1/10 12:24:19 (pid:2144) Sent ad to 2 collectors for rfinch@xxxxxxxxxxxx
> 1/10 12:24:19 (pid:2144) Haven't heard from negotiator, trying to claim
> local startd
> 1/10 12:24:19 (pid:2144) Claiming local startd vm 2 at
> <136.200.YYYYY:1219>
> 1/10 12:24:19 (pid:2144) Negotiator gone, trying to use our local startd
> 1/10 12:24:27 (pid:2144) Starting add_shadow_birthdate(1287.0)
> 1/10 12:24:27 (pid:2144) Started shadow for job 1287.0 on
> "<136.200.YYYYY:1219>", (shadow pid = 1492)
> 1/10 12:24:27 (pid:2144) Sent ad to central manager for
> rfinch@xxxxxxxxxxxx
> 1/10 12:24:27 (pid:2144) Sent ad to 2 collectors for rfinch@xxxxxxxxxxxx
> 1/10 12:24:27 (pid:2144) Haven't heard from negotiator, trying to claim
> local startd
> 1/10 12:24:32 (pid:2144) DaemonCore: PERMISSION DENIED to unknown user
> from host <136.200.XXXXX:1831> for command 416 (NEGOTIATE)
> 
> YYYYY is MERRIT, XXXXX is the other HAD machine (delta-mod).
> 
> The HADLog and CollectorLog show no problems.
> 
> Any clues appreciated.
> 
> Ralph Finch, P.E.
> Dept. of Water Resources
> Bay-Delta Office, Room 215-13
> Sacramento, CA  95814
> 916-653-7552
> rfinch@xxxxxxxxxxxx
> 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>