[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job Scheduling issue in 8.8.5 version



On 8/27/21 12:48 AM, ervikrant06@xxxxxxxxx wrote:
Hello Experts,

We are seeing an issue where one job of the batch remains in idle state despite having resources available in the cluster. This started happening after the update to 8.8.5, we never saw this behavior with the 8.5.8 version.


Hi Vikrant:

Can we see (off list, if you so desire), the StartLog on the worker node for the claim failure in question?


-greg



We are using scheduler level splitting of slots.Â

# condor_config_val CLAIM_PARTITIONABLE_LEFTOVERS
true

Whenever this issue happened we noticed "Request was NOT accepted for claim" in schedlog which I believeÂindicates one failed attempt was made but then another attempt was made approx after 21m this time the job started running.Â

# grep '2290171.0' /var/log/condor/SchedLog
08/27/21 00:22:30 (pid:9386) job_transforms for 2290171.0: 1 considered, 1 applied (SetTestTeam)
08/27/21 00:22:44 (pid:9386) Request was NOT accepted for claim slot1@xxxxxxxxxxxxxxxxxxxxxxx<xx.xx.84.175:9618?addrs=xx.xx.84.175-9618&noUDP&sock=7226_0371_3> for testuser1 2290171.0
08/27/21 00:22:44 (pid:9386) Match record (slot1@xxxxxxxxxxxxxxxxxxxxxxx<xx.xx.84.175:9618?addrs=xx.xx.84.175-9618&noUDP&sock=7226_0371_3> for testuser1, 2290171.0) deleted
08/27/21 00:43:40 (pid:9386) Starting add_shadow_birthdate(2290171.0)
08/27/21 00:43:40 (pid:9386) Started shadow for job 2290171.0 on slot1@xxxxxxxxxxxxxxxxxxxxxxx<xx.xx.84.31:9618?addrs=xx.xx.84.31-9618&noUDP&sock=56704_ce58_3> for testuser1, (shadow pid = 1946817)

What can we do to speed the job matchmakingÂafter the first failed attempt?Â



Thanks & Regards,
Vikrant Aggarwal

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/