[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] strange job submission problem



Jack,

> I hope someone can shed some light on my problem.
>
> I have a Linux box with Globus Toolkit 4 installed, when I submit a
> job to Condor the pre script and the job runs successfully, but not
> the post script. In the dagman.out file I have found this very strange
> error message:
>
> /opt/portal/portal-data/users/wootton/hostname_files/hostname.dag.dagman.out:
>
> 9/14 12:34:44 Event: ULOG_GLOBUS_SUBMIT for Condor Job Job0 (73.0.0)
> 9/14 12:34:44 ERROR "Invalid ULogEventNumber" at line 159 in file
> condor_event.C

Are your node job user log files on NFS by any chance?  If so, that is
very likely the cause of your problem.

> the job status is also odd:
>
> -- Submitter: node01.cluster.cpc.wmin.ac.uk : <161.74.87.17:2604> :
> node01.cluster.cpc.wmin.ac.uk
>  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
>   73.0   wootton         9/14 12:34   0+00:00:00 C  0   9.8  hostname -f
>
> What does "C" stands for?

"Completed" -- see the condor_q man page.

> And what does the following mean? "Invalid ULogEventNumber" at line 159 in file
> condor_event.C

DAGMan figures out the state of the node jobs by reading their user log
files.  The error message means that the log reading code encountered
an event type number that it did not recognize.


Can you send the user log file(s) for your node jobs?  That would help a
lot in figuring out the problem.

Kent Wenger
Condor Team