[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Scheduler crash on SOAP call to listSpool



Hello.

I have installed condor to my machine with 64-bit Ubuntu 12.04.

I am trying to submit a job via SOAP and then to get stdout and stderr from it. I declare two files, "job.out" and "job.err" and set them as Job ClassAd attributes "Out" and "Err". Then I submit a job with SOAP method "submit". When job eventually completes, I can see two files in its spool directory (job.out and job.err) with some content in the first.

If I read job ClassAd with SOAP call "getJobAd" then "JobStatus" attribute has value COMPLETED.

But when I call SOAP method "listSpool" then I get connection error, because schedd daemon crashes.

SchedLog contains the following information about crash:

01/21/13 07:43:45 (pid:7028) Received HTTP POST connection from <my_ip>
01/21/13 07:43:45 (pid:7028) About to serve HTTP request...
Stack dump for process 7028 at timestamp 1358754225 (21 frames)
/usr/local/condor/sbin/../lib/libcondor_utils_7_8_7.so(dprintf_dump_stack+0x5f)[0x7f256332338f]
/usr/local/condor/sbin/../lib/libcondor_utils_7_8_7.so(+0x1521ad)[0x7f25633431ad]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f2561ac1cb0]
/lib/x86_64-linux-gnu/libc.so.6(+0x89101)[0x7f256177c101]
/usr/local/condor/sbin/../lib/libcondor_utils_7_8_7.so(_Z7strnewpPKc+0xe)[0x7f256330deae]
/usr/local/condor/sbin/../lib/libcondor_utils_7_8_7.so(_ZN8StatInfoC2EPKc+0x14)[0x7f2563313f74]
condor_schedd(_ZN3Job14get_spool_listER4ListI8FileInfoER11CondorError+0x2f)[0x44325f]
condor_schedd(_Z17condor__listSpoolP4soapP19condor__TransactioniiR25condor__listSpoolResponse+0x1fe)[0x4ab76e]
condor_schedd(_Z28soap_serve_condor__listSpoolP4soap+0x90)[0x4cfc90]
condor_schedd(_Z10soap_serveP4soap+0x64)[0x4d0784]
/usr/local/condor/sbin/../lib/libcondor_utils_7_8_7.so(_ZN21DaemonCommandProtocol11ReadCommandEv+0x15e)[0x7f2563435e2e]
/usr/local/condor/sbin/../lib/libcondor_utils_7_8_7.so(_ZN21DaemonCommandProtocol10doProtocolEv+0xb5)[0x7f2563437175]
/usr/local/condor/sbin/../lib/libcondor_utils_7_8_7.so(_ZN21DaemonCommandProtocol14SocketCallbackEP6Stream+0x74)[0x7f25634372c4]
/usr/local/condor/sbin/../lib/libcondor_utils_7_8_7.so(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x864)[0x7f256344fa14]
/usr/local/condor/sbin/../lib/libcondor_utils_7_8_7.so(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x1d)[0x7f256344fb5d]
/usr/local/condor/sbin/../lib/libcondor_utils_7_8_7.so(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x40)[0x7f25633aee00]
/usr/local/condor/sbin/../lib/libcondor_utils_7_8_7.so(_ZN10DaemonCore17CallSocketHandlerERib+0x13f)[0x7f256344bbff]
/usr/local/condor/sbin/../lib/libcondor_utils_7_8_7.so(_ZN10DaemonCore6DriverEv+0x1dd6)[0x7f2563453056]
/usr/local/condor/sbin/../lib/libcondor_utils_7_8_7.so(_Z7dc_mainiPPc+0xf89)[0x7f256343a909]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f256171476d]
condor_schedd[0x442a61]

After some investigation through the code, I have found, that in the file "condor_schedd.V6/schedd_api.cpp" at the line 187 in method Job::get_spool_list(...)

183: int
184: Job::get_spool_list(List<FileInfo> &file_list,
185:                                         CondorError &errstack)
186: {
187:         StatInfo directoryInfo(spoolDirectory.Value());
188:         if (directoryInfo.IsDirectory()) {

method "spoolDirectory.Value()" returns invalid address, and application crashes when tries to dereference it.