[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] "Bad CONDOR_JOB_STATUS_CONSTRAINED Result" [Sec=Unclassified]



On Jul 8, 2008, at 12:40 AM, Troy Robertson wrote:

I've posted this problem recently but had no resolution, but think it
may have been because it was formatted HTML?

Anyway,

I upgraded our Condor config to Condor-C to address problems for users
submitting from semi-permanent laptops, and now they are unable to
receive their results.

Condor_C Jobs are being submitted, executed and are showing as status
Complete on the remote linux central manager but are not being returned
to the windows submit machines. The GridManager keeps returning the
following error:

6/26 08:56:10 [1556] ERROR "Bad CONDOR_JOB_STATUS_CONSTRAINED Result" at
line 3808 in file ..\src\condor_gridmanager\gahp-client.C

I also keep ending up with a core.C_GAHP.WIN32 core dump from the GAHP
server.

All 5 execute machines are linux, same config and dedicated.
Submitters are Windows XP.  Central Manager is linux.
Using 6.9.3, not sure if upgrading might help?

With further testing it appears to happen when I submit more than one
job.  If I submit one job and wait the results are returned.  If I
submit more than one then they all just sit there on the central manager
in a Completed state.  If I then remove the first job the rest are
returned as expected.

In the gridlog below, The job was sitting Completed on the central
manager and the gridmanager just cycled through the first 15 or so lines repeatedly. I then did a condor_rm on the job and the error was thrown.
A second job I had submitted, which was also sitting 'Complete' and
blocked? by the first was then returned.


This looks like a strange bug. Can you add the following lines to your Condor config file:

C_GAHP_DEBUG = D_FULLDEBUG
C_GAHP_WORKER_THREAD_DEBUG = D_FULLDEBUG

Then repeat your test and send me (off-list) the resulting gridmanager and c-gahp logs. You can find the locations of the latter by running these commands:
condor_config_val C_GAHP_LOG
condor_config_val C_GAHP_WORKER_THREAD_LOG

Also, can you try examining the core file (produced by either condor_c- gahp or condor_c-gahp_worker_thread) and getting a stack trace?

Thanks and regards,
Jaime Frey
UW-Madison Condor Team