[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] terminate called after throwing an instance of 'boost::python::error_already_set



Larry,

One more follow up, are you using the bindings that were installed along with the base htcondor deb package (e.g. via apt) or are you installing the bindings into a virtual or conda environment, and if so, what version of the bindings are you installing into that environment (i.e. is it a different version than 8.9.11)?

Thanks

Jason Patton

On 5/31/22 2:24 PM, Larry Martell wrote:
On Tue, May 31, 2022 at 10:42 AM Cole Bollig via HTCondor-users
<htcondor-users@xxxxxxxxxxx> wrote:

Hello Larry,

At the moment we think this issue is deeper than the python layer but could use some more information.
What version of condor is this happening on?

$CondorVersion: 8.9.11 Dec 29 2020 BuildID: Debian-8.9.11-1.2
PackageID: 8.9.11-1.2 Debian-8.9.11-1.2 $
$CondorPlatform: X86_64-Ubuntu_20.04 $

Where the exception is being thrown?
What is the script doing when the exception is thrown?

We have a python script, PWIL.py that is run using condor. That in
turn runs another python script, CR.py also using condor. That runs a
C++ program from within its own process (i.e. there is not another
condor submission for the C++ run). I see the error in the PWIL log
along with these errors:

error from htcondor.Submit.queue
Failed to abort transaction.

The program does something like this:

submit = htcondor.Submit(submit_dict)
with schedd.transaction() as txn:
     submit.queue(txn)
completed_job_ads = schedd.query(constraint="JobStatus == 4",
projection=['ClusterId'])
completed_jobs = [completed_job['ClusterId'] for completed_job in
completed_job_ads]
for id in job_ids:
     if id in completed_jobs:
         schedd.act(htcondor.JobAction.Remove, 'clusterid==%d' % id)


And just to be safe.
What version of python are you running?

Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux

Thanks!

Larry

-Cole Bollig
________________________________
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Larry Martell <larry.martell@xxxxxxxxx>
Sent: Sunday, May 29, 2022 3:55 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] terminate called after throwing an instance of 'boost::python::error_already_set

I have a script that has literally been running using condor for 10 years.
Suddenly, for some runs it crashes with the error:

terminate called after throwing an instance of 'boost::python::error_already_set

I assume this is coming from condor. Anyone have any thoughts on what could be causing this and/or how I can debug it?
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/