[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] is there some kind of caching for condor_q requests ?



<If your scheduler is not super busy, increasing maximum number scheduler workers (SCHEDD_QUERY_WORKERS) works too. >

I tried that and the number of workers went up to over a 100 ! - that's how sick my users are ;)

Best
christoph


--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Jin Mao" <jin@xxxxxxxxxxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Freitag, 28. Februar 2020 17:51:35
Betreff: Re: [HTCondor-users] is there some kind of caching for condor_q requests ?

+1 for a feature request to cache job status and worker status. It will be nice to be able to enforce this feature so end users cannot bypass it.

If your scheduler is not super busy, increasing maximum number scheduler workers (SCHEDD_QUERY_WORKERS) works too.

One approach, when scheduler is super busy, you may generate a global condor_q result and save to a json file (-json cli option) regularly. Then, ask users to query the file using -jobads:json option , instead of querying condor. Utilizing condor ACL, you can also block users from running condor_q against condor directly.




On Fri, Feb 28, 2020 at 11:36 AM Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
Hi,

I know that I need to educate my users or swap them against other users, the later is not possible apparently while the first option is a tedious task.

I recently had no job starts on a scheduler for other reasons (downtime of a file server) once the file server was up again and technically CMS group jobs could have been started again nothing was happening. In the logs it was obvious that some users were hammering hard on the sched with 'watch condor_q' and other variations of scripts that do condor_q -l in a loop and so on.

There was no way to convince the sched to start some jobs other than kill these user processes. Is this related to the 'downtime' with no job starts at all and is there a possible caching mechanism I could use to get rid of this load ?


/var/log/condor/SchedLog:02/28/20 14:57:49 (pid:2006213) Number of Active Workers 2
/var/log/condor/SchedLog:02/28/20 14:57:49 (pid:2006213) Number of Active Workers 3
/var/log/condor/SchedLog:02/28/20 14:57:49 (pid:2006213) Number of Active Workers 4
/var/log/condor/SchedLog:02/28/20 14:57:49 (pid:2006213) Number of Active Workers 5
/var/log/condor/SchedLog:02/28/20 14:57:49 (pid:2006213) Number of Active Workers 6
/var/log/condor/SchedLog:02/28/20 14:57:49 (pid:2006213) Number of Active Workers 7
/var/log/condor/SchedLog:02/28/20 14:57:50 (pid:2006213) Number of Active Workers 6
/var/log/condor/SchedLog:02/28/20 14:57:50 (pid:2006213) Number of Active Workers 4
/var/log/condor/SchedLog:02/28/20 14:57:50 (pid:2006213) Number of Active Workers 5
/var/log/condor/SchedLog:02/28/20 14:57:50 (pid:2006213) Number of Active Workers 3
/var/log/condor/SchedLog:02/28/20 14:57:50 (pid:2006213) Number of Active Workers 4
/var/log/condor/SchedLog:02/28/20 14:57:50 (pid:2006213) Number of Active Workers 5
/var/log/condor/SchedLog:02/28/20 14:57:50 (pid:2006213) Number of Active Workers 6
/var/log/condor/SchedLog:02/28/20 14:57:50 (pid:2006213) Number of Active Workers 7
/var/log/condor/SchedLog:02/28/20 14:57:50 (pid:2006213) ForkWork: not forking because reached max workers 8

Best
Christoph

--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/