[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor 6.8.4 gridmanager crashes for >100 gahp_servers

Hi Steven, all,
FYI, we use Condor-G on the EGEE gLite Workload Management System
to drive grid jobs sent to GT2 gatekeepers and NorduGrid/ARC services.

> Aren't there known security faults in the condor 6.8.4 gahp_server?

Possibly, but we use our own patched version (built by LCG).

> Aren't there a bunch of unpatched security
> faults in condor 6.8.4 in general?

Possibly, but the gLite WMS uses Condor mostly internally:
only the gahp_server is exposed to the world.

> Can't you split up the load between multiple schedd's
> (and thus multiple condor_gridmanagers) so
> one condor_gridmanager doesn't have 100 gahp_servers going?

The gLite WMS currently routes all Condor-G jobs via a single schedd.
In principle we could make it reject new jobs when the number of
gahp_servers is close to the limit, but the schedd _did_ accept
all those jobs, so they should not cause its own back end to fail...