[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Schedd restart reports.



I just upgraded from 8.2.9 to 8.4.3 yesterday, and I've started getting
a flood of mail from condor. At first I thought it was a one-time thing
for each machine, as it noticed the new version, but I've now come to
realize that I'm getting an email every time the scheduler restarts on a
machine.

Subject: [Condor] Schedd restart report for hostname.

This is an automated email from the Condor system
on machine "hostname".  Do not reply.

The schedd hostname restarted at 01/20/16 10:17:22.
It attempted to reconnect to machines where its jobs may still be
running.
All reconnect attempts have finished.
Here is a summary of the reconnect attempts:

0 total jobs where reconnecting is possible
0 reconnects are still being attempted
0 reconnects weren't attempted because the lease expired before the
schedd restarted
0 reconnects failed
0 reconnects were interrupted by job removal or other event
0 reconnects succeeded


I guess this wouldn't be so bad, if it was infrequent, but we have a
public lab where the condor daemons get killed every time a user logs in
on the console, then restarted when they log out, so I'm getting a TON
of this mail.

I dug through the release notes for 8.4.3 and a few versions back, until
my eyes started to bleed a little, but didn't see any mention of this.
There's a lot of versions between 8.4.3 and 8.2.9, though, so I might
have missed it.

Was this a feature that was added? And if so.. is there an off switch?

Thanks..

--
amy