[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] CondorCE with Condor HA setup broke
- Date: Tue, 14 Dec 2021 03:12:03 +0000
- From: Jaime Frey <jfrey@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] CondorCE with Condor HA setup broke
An HA configuration for the Condor LRMS should not be an issue for the CE. I also wouldnât expect real jobs to fail when trace jobs succeeded. I assume excerpt  is from the CE SchedLog and  from the LRMS Condor configuration?
The messages in  donât look like a problem. I expect to see them, since the CE doesnât have a startd or negotiator. Do you see anything else in the logs thatâs indicative of a problem? Is the Job Router failing to contact the LRMS schedd?
> On Dec 13, 2021, at 10:04 AM, Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:
> Hi all,
> we moved today our Condor LRMS to HA and I stumbled over a problem, that
> the CondorCEs had problems with the two heads. Interestingly, I had not
> run into the issue on my test cluster as trace jobs to the test CEs
> reached their LRMS Condor.
> Also on the production cluster setup I had not noticed the issue at
> first as trace jobs to the production CondorCEs went through to Condor
> and started to run - however, real user jobs failed to get passed
> through 
> I pinned for the moment the CEs' LRMS condor configs to a non-HA single
> CONDOR_HOST, which works with the CondorCE config [2,3].
> But I am looking now for the proper setup to attach the CondorCEs to the
> HA-aware schedulers ð - and why the trace jobs went through while real
> jobs failed? Since the trace jobs should also have gone throught the CE
> to reach the cluster, or?
>  SchedLog @ grid-htcondorce1.desy.de
> 12/13/21 16:21:07 Can't find address for startd grid-htcondorce1.desy.de
> 12/13/21 16:21:07 Can't find address for negotiator
> 12/13/21 16:21:07 Failed to send RESCHEDULE to unknown daemon:
> 12/13/21 16:21:07 Job 977401.0 released from hold: Data files spooled
>  CE sched conf
> # CENTRAL_MANAGER1 = condor01.desy.de
> # CENTRAL_MANAGER2 = grid-htc-master02.desy.de
> #CONDOR_HOST = condor01.desy.de,grid-htc-master02.desy.de
> CONDOR_HOST = condor01.desy.de
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> The archives can be found at: