[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_schedd fails after some time



Dear Jaime Frey,

You put the ASSERT in the code four years ago, maybe you know the reason of the problem? After this fail, all jobs are stopped.

Best regards,
Dmitry.


From: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
To: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Cc: "Dmitry Golubkov" <dmitry.golubkov@xxxxxxxxxxxxxx>
Sent: Sunday, October 17, 2021 10:18:36 PM
Subject: [HTCondor-users] condor_schedd fails after some time

Dear all,

I have the problem with my cluster, condor_schedd fails after some time with the error in the log:

2021-10-17T13:52:30.814107888Z condor_schedd[12521]: DedicatedScheduler creating Allocations for reconnected job (6.0)
2021-10-17T13:52:30.896151617Z condor_schedd[12521]: DedicatedScheduler creating Allocations for reconnected job (6.53)
2021-10-17T13:52:30.896566762Z condor_schedd[12521]: ERROR "Assertion ERROR on (allocations->insert( cluster, alloc ) == 0)" at line 2929 in file /var/lib/condor/execute/slot1/dir_26614/userdir/.tmpdakAr8/condor-8.9.11/src/condor_schedd.V6/dedicated_scheduler.cpp
2021-10-17T13:52:30.898919572Z condor_schedd[12521]: Cron: Killing all jobs
2021-10-17T13:52:30.898943994Z condor_schedd[12521]: CronJobList: Deleting all jobs
2021-10-17T13:52:30.975443327Z condor_schedd[12521]: Cron: Killing all jobs
2021-10-17T13:52:30.975483659Z condor_schedd[12521]: CronJobList: Deleting all jobs
2021-10-17T13:52:30.975494422Z condor_master[1048]: DefaultReaper unexpectedly called on pid 12521, status 1024.
2021-10-17T13:52:30.975498252Z condor_master[1048]: The SCHEDD (pid 12521) exited with status 4


Any ideas of the problem's reason?


Dmitry A. Golubkov
DATADVANCE
Mob. +7 910 4400124
dmitry.golubkov@xxxxxxxxxxxxxx

This message may contain confidential information
constituting a trade secret of DATADVANCE. Any distribution,
use or copying of the information contained in this
message is ineligible except under the internal
regulations of DATADVANCE and may entail liability in
accordance with the current legislation of the Russian
Federation. If you have received this message by mistake
please immediately inform me of it. Thank you!


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/