[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] terminate called after throwing an instance of 'boost::python::error_already_set



On Tue, May 31, 2022 at 10:42 AM Cole Bollig via HTCondor-users
<htcondor-users@xxxxxxxxxxx> wrote:
>
> Hello Larry,
>
> At the moment we think this issue is deeper than the python layer but could use some more information.
> What version of condor is this happening on?

$CondorVersion: 8.9.11 Dec 29 2020 BuildID: Debian-8.9.11-1.2
PackageID: 8.9.11-1.2 Debian-8.9.11-1.2 $
$CondorPlatform: X86_64-Ubuntu_20.04 $

> Where the exception is being thrown?
> What is the script doing when the exception is thrown?

We have a python script, PWIL.py that is run using condor. That in
turn runs another python script, CR.py also using condor. That runs a
C++ program from within its own process (i.e. there is not another
condor submission for the C++ run). I see the error in the PWIL log
along with these errors:

error from htcondor.Submit.queue
Failed to abort transaction.

The program does something like this:

submit = htcondor.Submit(submit_dict)
with schedd.transaction() as txn:
    submit.queue(txn)
completed_job_ads = schedd.query(constraint="JobStatus == 4",
projection=['ClusterId'])
completed_jobs = [completed_job['ClusterId'] for completed_job in
completed_job_ads]
for id in job_ids:
    if id in completed_jobs:
        schedd.act(htcondor.JobAction.Remove, 'clusterid==%d' % id)


> And just to be safe.
> What version of python are you running?

Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux

Thanks!

Larry

> -Cole Bollig
> ________________________________
> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Larry Martell <larry.martell@xxxxxxxxx>
> Sent: Sunday, May 29, 2022 3:55 PM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: [HTCondor-users] terminate called after throwing an instance of 'boost::python::error_already_set
>
> I have a script that has literally been running using condor for 10 years.
> Suddenly, for some runs it crashes with the error:
>
> terminate called after throwing an instance of 'boost::python::error_already_set
>
> I assume this is coming from condor. Anyone have any thoughts on what could be causing this and/or how I can debug it?
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/