[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor 6.8.4 gridmanager crashes for >100 gahp_servers
- Date: Fri, 30 Apr 2010 22:55:48 +0200
- From: Maarten Litmaath <Maarten.Litmaath@xxxxxxx>
- Subject: Re: [Condor-users] Condor 6.8.4 gridmanager crashes for >100 gahp_servers
Hi Steven, all,
FYI, we use Condor-G on the EGEE gLite Workload Management System
to drive grid jobs sent to GT2 gatekeepers and NorduGrid/ARC services.
> Aren't there known security faults in the condor 6.8.4 gahp_server?
Possibly, but we use our own patched version (built by LCG).
> Aren't there a bunch of unpatched security
> faults in condor 6.8.4 in general?
Possibly, but the gLite WMS uses Condor mostly internally:
only the gahp_server is exposed to the world.
> Can't you split up the load between multiple schedd's
> (and thus multiple condor_gridmanagers) so
> one condor_gridmanager doesn't have 100 gahp_servers going?
The gLite WMS currently routes all Condor-G jobs via a single schedd.
In principle we could make it reject new jobs when the number of
gahp_servers is close to the limit, but the schedd _did_ accept
all those jobs, so they should not cause its own back end to fail...